marcoyang's picture
add models, log and scripts
9530b1f
raw
history blame
138 kB
2023-12-20 17:30:48,671 INFO [train.py:953] (1/4) Training started
2023-12-20 17:30:48,671 INFO [train.py:963] (1/4) Device: cuda:1
2023-12-20 17:30:48,671 INFO [train.py:965] (1/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2b2ac14b326d61d79d04e53fbd69b1ff6d630411', 'k2-git-date': 'Thu Aug 24 05:58:26 2023', 'lhotse-version': '0.0.0+unknown.version', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'audio_tagging', 'icefall-git-sha1': 'bd01c212-clean', 'icefall-git-date': 'Tue Dec 19 17:20:49 2023', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_audio_tagging', 'k2-path': '/star-xy/softwares/k2_development/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/lhotse_development/lhotse_at/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-7-1218101249-5bcbfb5567-jsftr', 'IP address': '10.177.6.147'}, 'world_size': 4, 'master_port': 13455, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp_at_as_full'), 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'num_events': 527, 'audioset_subset': 'full', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures'}
2023-12-20 17:30:48,672 INFO [train.py:967] (1/4) About to create model
2023-12-20 17:30:53,840 INFO [train.py:971] (1/4) Number of model parameters: 64264454
2023-12-20 17:30:56,725 INFO [train.py:986] (1/4) Using DDP
2023-12-20 17:30:57,442 INFO [at_datamodule.py:398] (1/4) About to get the audioset cuts for KD.
2023-12-20 17:30:57,498 INFO [at_datamodule.py:223] (1/4) Enable MUSAN
2023-12-20 17:30:57,498 INFO [at_datamodule.py:224] (1/4) About to get Musan cuts
2023-12-20 17:30:59,783 INFO [at_datamodule.py:248] (1/4) Enable SpecAugment
2023-12-20 17:30:59,783 INFO [at_datamodule.py:249] (1/4) Time warp factor: 80
2023-12-20 17:30:59,784 INFO [at_datamodule.py:259] (1/4) Num frame mask: 10
2023-12-20 17:30:59,784 INFO [at_datamodule.py:272] (1/4) About to create train dataset
2023-12-20 17:30:59,784 INFO [at_datamodule.py:299] (1/4) Using DynamicBucketingSampler.
2023-12-20 17:31:01,662 INFO [at_datamodule.py:315] (1/4) About to create train dataloader
2023-12-20 17:31:01,663 INFO [at_datamodule.py:410] (1/4) About to get test-other cuts
2023-12-20 17:31:01,664 INFO [at_datamodule.py:346] (1/4) About to create dev dataset
2023-12-20 17:31:02,110 INFO [at_datamodule.py:363] (1/4) About to create dev dataloader
2023-12-20 17:31:25,014 INFO [train.py:886] (1/4) Epoch 1, batch 0, loss[loss=1.835, audio_tagging_loss=1.835, over 24132.00 frames. ], tot_loss[loss=1.835, audio_tagging_loss=1.835, over 24132.00 frames. ], batch size: 100, lr: 2.25e-02, grad_scale: 2.0
2023-12-20 17:31:25,014 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:31:46,187 INFO [train.py:917] (1/4) Epoch 1, validation: loss=1.716, audio_tagging_loss=1.716, over 3737520.00 frames.
2023-12-20 17:31:46,188 INFO [train.py:918] (1/4) Maximum memory allocated so far is 13125MB
2023-12-20 17:31:50,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=0.0, ans=0.5
2023-12-20 17:31:50,928 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=7.5
2023-12-20 17:31:56,787 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.044e+02 8.568e+02 1.002e+03 1.369e+03 1.715e+03, threshold=4.006e+03, percent-clipped=0.0
2023-12-20 17:31:59,596 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=25.87 vs. limit=7.55
2023-12-20 17:32:01,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=66.66666666666667, ans=0.8976666666666667
2023-12-20 17:32:03,596 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=137.40 vs. limit=5.016666666666667
2023-12-20 17:32:06,824 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=338.86 vs. limit=5.033333333333333
2023-12-20 17:32:07,423 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.268e+01 3.256e+02 7.044e+02 1.161e+03 1.783e+03, threshold=2.818e+03, percent-clipped=0.0
2023-12-20 17:32:08,875 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=238.11 vs. limit=7.55
2023-12-20 17:32:10,371 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=231.09 vs. limit=7.6
2023-12-20 17:32:17,820 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=511.64 vs. limit=7.65
2023-12-20 17:32:20,719 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=382.71 vs. limit=7.65
2023-12-20 17:32:22,330 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=302.92 vs. limit=7.575
2023-12-20 17:32:28,032 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=129.21 vs. limit=4.08
2023-12-20 17:32:30,896 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.273e+01 1.290e+02 2.793e+02 8.337e+02 1.783e+03, threshold=1.117e+03, percent-clipped=0.0
2023-12-20 17:32:39,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=266.6666666666667, ans=3.04
2023-12-20 17:32:42,072 INFO [train.py:886] (1/4) Epoch 1, batch 50, loss[loss=0.05699, audio_tagging_loss=0.05699, over 25000.00 frames. ], tot_loss[loss=0.3011, audio_tagging_loss=0.3011, over 1123484.19 frames. ], batch size: 100, lr: 2.48e-02, grad_scale: 2.0
2023-12-20 17:33:00,531 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=135.29 vs. limit=7.76
2023-12-20 17:33:07,711 INFO [train.py:886] (1/4) Epoch 2, batch 0, loss[loss=0.05759, audio_tagging_loss=0.05759, over 25000.00 frames. ], tot_loss[loss=0.05759, audio_tagging_loss=0.05759, over 25000.00 frames. ], batch size: 100, lr: 2.44e-02, grad_scale: 4.0
2023-12-20 17:33:07,712 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:33:15,997 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1000, 5.3174, 4.9056, 5.2685], device='cuda:1')
2023-12-20 17:33:23,307 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.8108, 4.8136, 4.8126, 4.8180], device='cuda:1')
2023-12-20 17:33:28,177 INFO [train.py:917] (1/4) Epoch 2, validation: loss=0.0597, audio_tagging_loss=0.0597, over 3737520.00 frames.
2023-12-20 17:33:28,177 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14681MB
2023-12-20 17:33:39,386 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=95.67 vs. limit=5.1033333333333335
2023-12-20 17:33:42,739 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=77.45 vs. limit=4.165333333333333
2023-12-20 17:33:42,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=413.3333333333333, ans=7.81
2023-12-20 17:33:44,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=324.85 vs. limit=7.81
2023-12-20 17:33:51,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=480.0, ans=0.4775
2023-12-20 17:33:57,906 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=342.98 vs. limit=7.68
2023-12-20 17:34:01,195 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=299.82 vs. limit=7.86
2023-12-20 17:34:16,801 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=211.96 vs. limit=7.96
2023-12-20 17:34:24,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=613.3333333333334, ans=0.8785333333333334
2023-12-20 17:34:25,459 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.968e+01 6.154e+01 2.791e+02 2.019e+03, threshold=1.231e+02, percent-clipped=1.0
2023-12-20 17:34:26,069 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=258.13 vs. limit=8.01
2023-12-20 17:34:26,584 INFO [train.py:886] (1/4) Epoch 2, batch 50, loss[loss=0.0579, audio_tagging_loss=0.0579, over 25000.00 frames. ], tot_loss[loss=0.05954, audio_tagging_loss=0.05954, over 1113463.61 frames. ], batch size: 100, lr: 2.66e-02, grad_scale: 2.0
2023-12-20 17:34:26,778 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00
2023-12-20 17:34:26,869 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=205.70 vs. limit=7.755
2023-12-20 17:34:44,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=693.3333333333334, ans=0.8757333333333334
2023-12-20 17:34:44,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=298.12 vs. limit=7.76
2023-12-20 17:34:52,016 INFO [train.py:886] (1/4) Epoch 3, batch 0, loss[loss=0.06998, audio_tagging_loss=0.06998, over 21459.00 frames. ], tot_loss[loss=0.06998, audio_tagging_loss=0.06998, over 21459.00 frames. ], batch size: 106, lr: 2.54e-02, grad_scale: 4.0
2023-12-20 17:34:52,017 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:35:12,452 INFO [train.py:917] (1/4) Epoch 3, validation: loss=0.05878, audio_tagging_loss=0.05878, over 3737520.00 frames.
2023-12-20 17:35:12,453 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14681MB
2023-12-20 17:35:12,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=169.18 vs. limit=7.76
2023-12-20 17:35:13,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=693.3333333333334, ans=0.17400000000000002
2023-12-20 17:35:21,263 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=254.30 vs. limit=8.02
2023-12-20 17:35:24,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=105.70 vs. limit=5.38
2023-12-20 17:35:29,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=760.0, ans=5.475
2023-12-20 17:35:30,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=295.13 vs. limit=7.785
2023-12-20 17:35:33,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=760.0, ans=0.464375
2023-12-20 17:35:33,790 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=214.83 vs. limit=7.785
2023-12-20 17:35:33,979 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=33.33 vs. limit=7.785
2023-12-20 17:35:35,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=826.6666666666666, ans=0.8710666666666667
2023-12-20 17:35:40,425 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=218.09 vs. limit=5.413333333333333
2023-12-20 17:35:41,371 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00
2023-12-20 17:35:43,597 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=7.81
2023-12-20 17:35:44,440 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=327.16 vs. limit=8.12
2023-12-20 17:35:46,887 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=315.28 vs. limit=7.81
2023-12-20 17:35:49,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=893.3333333333334, ans=0.458125
2023-12-20 17:35:55,312 WARNING [optim.py:500] (1/4) Scaling gradients by 0.09217905253171921, model_norm_threshold=123.07855224609375
2023-12-20 17:35:55,459 WARNING [optim.py:572] (1/4) Parameter dominating tot_sumsq module.encoder_embed.conv.7.weight with proportion 0.48, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.614e+05, grad_sumsq=6.752e+08, orig_rms_sq=1.276e-03
2023-12-20 17:36:01,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=21.13 vs. limit=7.86
2023-12-20 17:36:05,336 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.85 vs. limit=4.384
2023-12-20 17:36:09,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=960.0, ans=0.164
2023-12-20 17:36:11,141 INFO [train.py:886] (1/4) Epoch 3, batch 50, loss[loss=0.05275, audio_tagging_loss=0.05275, over 25000.00 frames. ], tot_loss[loss=0.05574, audio_tagging_loss=0.05574, over 1121484.75 frames. ], batch size: 100, lr: 2.75e-02, grad_scale: 4.0
2023-12-20 17:36:35,793 INFO [train.py:886] (1/4) Epoch 4, batch 0, loss[loss=0.05912, audio_tagging_loss=0.05912, over 25000.00 frames. ], tot_loss[loss=0.05912, audio_tagging_loss=0.05912, over 25000.00 frames. ], batch size: 100, lr: 2.58e-02, grad_scale: 8.0
2023-12-20 17:36:35,794 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:36:54,955 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.3721, 5.0210, 4.1473, 4.7986], device='cuda:1')
2023-12-20 17:36:55,851 INFO [train.py:917] (1/4) Epoch 4, validation: loss=0.05673, audio_tagging_loss=0.05673, over 3737520.00 frames.
2023-12-20 17:36:55,852 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14681MB
2023-12-20 17:37:11,044 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=123.07 vs. limit=5.553333333333334
2023-12-20 17:37:12,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.65 vs. limit=3.166
2023-12-20 17:37:23,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1173.3333333333333, ans=0.156
2023-12-20 17:37:25,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1173.3333333333333, ans=0.8589333333333333
2023-12-20 17:37:25,810 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=140.15 vs. limit=5.586666666666667
2023-12-20 17:37:33,883 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.57 vs. limit=4.496
2023-12-20 17:37:34,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1240.0, ans=0.28759999999999997
2023-12-20 17:37:35,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=195.92 vs. limit=7.965
2023-12-20 17:37:45,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1306.6666666666667, ans=0.151
2023-12-20 17:37:48,323 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=139.40 vs. limit=5.653333333333333
2023-12-20 17:37:49,921 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.093e+01 2.504e+01 2.720e+01 3.182e+01 1.335e+03, threshold=5.440e+01, percent-clipped=1.0
2023-12-20 17:37:52,701 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=338.24 vs. limit=7.99
2023-12-20 17:37:54,273 INFO [train.py:886] (1/4) Epoch 4, batch 50, loss[loss=0.05116, audio_tagging_loss=0.05116, over 25000.00 frames. ], tot_loss[loss=0.05483, audio_tagging_loss=0.05483, over 1117747.78 frames. ], batch size: 100, lr: 2.77e-02, grad_scale: 4.0
2023-12-20 17:38:12,343 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=153.26 vs. limit=8.54
2023-12-20 17:38:19,534 INFO [train.py:886] (1/4) Epoch 5, batch 0, loss[loss=0.05337, audio_tagging_loss=0.05337, over 24154.00 frames. ], tot_loss[loss=0.05337, audio_tagging_loss=0.05337, over 24154.00 frames. ], batch size: 100, lr: 2.59e-02, grad_scale: 8.0
2023-12-20 17:38:19,535 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:38:39,892 INFO [train.py:917] (1/4) Epoch 5, validation: loss=0.05523, audio_tagging_loss=0.05523, over 3737520.00 frames.
2023-12-20 17:38:39,893 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14681MB
2023-12-20 17:38:46,943 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.99 vs. limit=5.693333333333333
2023-12-20 17:38:48,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1386.6666666666667, ans=0.435
2023-12-20 17:38:52,034 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=24.62 vs. limit=8.045
2023-12-20 17:38:56,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1453.3333333333333, ans=0.28546666666666665
2023-12-20 17:38:58,677 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=150.98 vs. limit=8.59
2023-12-20 17:39:04,307 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=105.50 vs. limit=8.64
2023-12-20 17:39:08,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=36.14 vs. limit=4.304
2023-12-20 17:39:08,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=347.71 vs. limit=8.07
2023-12-20 17:39:10,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1520.0, ans=0.42875
2023-12-20 17:39:11,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=271.18 vs. limit=8.07
2023-12-20 17:39:11,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1520.0, ans=0.31
2023-12-20 17:39:14,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=63.68 vs. limit=8.07
2023-12-20 17:39:16,910 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=350.03 vs. limit=8.095
2023-12-20 17:39:20,378 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=91.05 vs. limit=8.095
2023-12-20 17:39:26,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=94.86 vs. limit=8.12
2023-12-20 17:39:28,326 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.39 vs. limit=8.74
2023-12-20 17:39:29,282 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=80.44 vs. limit=8.12
2023-12-20 17:39:32,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=329.69 vs. limit=8.12
2023-12-20 17:39:34,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1653.3333333333333, ans=0.044833333333333336
2023-12-20 17:39:38,884 INFO [train.py:886] (1/4) Epoch 5, batch 50, loss[loss=0.04927, audio_tagging_loss=0.04927, over 25000.00 frames. ], tot_loss[loss=0.05231, audio_tagging_loss=0.05231, over 1126559.95 frames. ], batch size: 100, lr: 2.77e-02, grad_scale: 8.0
2023-12-20 17:40:04,929 INFO [train.py:886] (1/4) Epoch 6, batch 0, loss[loss=0.05394, audio_tagging_loss=0.05394, over 24115.00 frames. ], tot_loss[loss=0.05394, audio_tagging_loss=0.05394, over 24115.00 frames. ], batch size: 100, lr: 2.59e-02, grad_scale: 16.0
2023-12-20 17:40:04,929 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:40:21,308 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1342, 4.5939, 4.2046, 3.9363], device='cuda:1')
2023-12-20 17:40:25,825 INFO [train.py:917] (1/4) Epoch 6, validation: loss=0.05425, audio_tagging_loss=0.05425, over 3737520.00 frames.
2023-12-20 17:40:25,826 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14782MB
2023-12-20 17:40:28,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=1733.3333333333333, ans=0.08916666666666667
2023-12-20 17:40:33,547 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=16.42 vs. limit=5.433333333333334
2023-12-20 17:40:35,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=14.99 vs. limit=4.693333333333333
2023-12-20 17:40:37,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1800.0, ans=0.415625
2023-12-20 17:40:39,323 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=12.79 vs. limit=4.72
2023-12-20 17:40:51,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=20.00 vs. limit=5.466666666666667
2023-12-20 17:40:53,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1866.6666666666667, ans=0.4125
2023-12-20 17:40:53,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1866.6666666666667, ans=0.8346666666666667
2023-12-20 17:41:00,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1933.3333333333333, ans=0.18211666666666668
2023-12-20 17:41:02,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=20.02 vs. limit=5.483333333333333
2023-12-20 17:41:06,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=184.24 vs. limit=8.225
2023-12-20 17:41:07,665 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=4.773333333333333
2023-12-20 17:41:08,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1933.3333333333333, ans=0.1275
2023-12-20 17:41:08,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=109.20 vs. limit=8.95
2023-12-20 17:41:09,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1933.3333333333333, ans=0.8323333333333334
2023-12-20 17:41:11,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2000.0, ans=0.40625
2023-12-20 17:41:14,959 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.090e+01 2.556e+01 2.831e+01 3.472e+01 7.747e+01, threshold=5.662e+01, percent-clipped=6.0
2023-12-20 17:41:19,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2000.0, ans=0.055
2023-12-20 17:41:19,985 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=220.80 vs. limit=8.25
2023-12-20 17:41:22,217 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=79.27 vs. limit=6.0
2023-12-20 17:41:23,849 INFO [train.py:886] (1/4) Epoch 6, batch 50, loss[loss=0.04985, audio_tagging_loss=0.04985, over 25000.00 frames. ], tot_loss[loss=0.05205, audio_tagging_loss=0.05205, over 1119666.96 frames. ], batch size: 100, lr: 2.76e-02, grad_scale: 16.0
2023-12-20 17:41:24,153 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=26.65 vs. limit=4.826666666666666
2023-12-20 17:41:49,203 INFO [train.py:886] (1/4) Epoch 7, batch 0, loss[loss=0.06588, audio_tagging_loss=0.06588, over 20889.00 frames. ], tot_loss[loss=0.06588, audio_tagging_loss=0.06588, over 20889.00 frames. ], batch size: 106, lr: 2.60e-02, grad_scale: 32.0
2023-12-20 17:41:49,203 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:42:09,828 INFO [train.py:917] (1/4) Epoch 7, validation: loss=0.05269, audio_tagging_loss=0.05269, over 3737520.00 frames.
2023-12-20 17:42:09,828 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 17:42:11,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2080.0, ans=0.40249999999999997
2023-12-20 17:42:13,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=40.80 vs. limit=8.28
2023-12-20 17:42:13,879 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=208.37 vs. limit=8.28
2023-12-20 17:42:14,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=39.95 vs. limit=5.0
2023-12-20 17:42:27,238 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.39 vs. limit=8.305
2023-12-20 17:42:32,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=168.62 vs. limit=9.11
2023-12-20 17:42:45,072 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=36.42 vs. limit=6.14
2023-12-20 17:42:55,722 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=165.89 vs. limit=8.38
2023-12-20 17:42:57,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2346.6666666666665, ans=0.20666666666666667
2023-12-20 17:42:59,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=12.13 vs. limit=5.586666666666667
2023-12-20 17:43:00,064 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=103.45 vs. limit=9.26
2023-12-20 17:43:01,148 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=106.55 vs. limit=9.26
2023-12-20 17:43:06,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=39.82 vs. limit=8.405
2023-12-20 17:43:07,586 INFO [train.py:886] (1/4) Epoch 7, batch 50, loss[loss=0.04806, audio_tagging_loss=0.04806, over 25000.00 frames. ], tot_loss[loss=0.05213, audio_tagging_loss=0.05213, over 1118761.40 frames. ], batch size: 100, lr: 2.76e-02, grad_scale: 1.0
2023-12-20 17:43:07,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2413.3333333333335, ans=0.38687499999999997
2023-12-20 17:43:32,846 INFO [train.py:886] (1/4) Epoch 8, batch 0, loss[loss=0.05969, audio_tagging_loss=0.05969, over 21042.00 frames. ], tot_loss[loss=0.05969, audio_tagging_loss=0.05969, over 21042.00 frames. ], batch size: 106, lr: 2.60e-02, grad_scale: 2.0
2023-12-20 17:43:32,847 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:43:47,919 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.8688, 4.3972, 3.5764, 3.4130], device='cuda:1')
2023-12-20 17:43:53,652 INFO [train.py:917] (1/4) Epoch 8, validation: loss=0.05155, audio_tagging_loss=0.05155, over 3737520.00 frames.
2023-12-20 17:43:53,653 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 17:43:55,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=59.31 vs. limit=9.32
2023-12-20 17:43:58,165 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=175.19 vs. limit=8.41
2023-12-20 17:44:04,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=41.73 vs. limit=8.41
2023-12-20 17:44:12,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=3.374
2023-12-20 17:44:22,982 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=33.31 vs. limit=9.42
2023-12-20 17:44:23,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=135.86 vs. limit=8.46
2023-12-20 17:44:23,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=26.90 vs. limit=6.28
2023-12-20 17:44:36,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2626.6666666666665, ans=0.10149999999999999
2023-12-20 17:44:39,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.14 vs. limit=5.673333333333334
2023-12-20 17:44:42,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=2693.3333333333335, ans=0.22306666666666666
2023-12-20 17:44:42,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=2693.3333333333335, ans=8.51
2023-12-20 17:44:43,338 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.635e+01 3.487e+01 4.265e+01 5.657e+01 4.687e+02, threshold=8.530e+01, percent-clipped=24.0
2023-12-20 17:44:51,056 INFO [train.py:886] (1/4) Epoch 8, batch 50, loss[loss=0.04785, audio_tagging_loss=0.04785, over 25000.00 frames. ], tot_loss[loss=0.04955, audio_tagging_loss=0.04955, over 1114180.34 frames. ], batch size: 100, lr: 2.75e-02, grad_scale: 2.0
2023-12-20 17:44:51,689 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=110.51 vs. limit=9.57
2023-12-20 17:45:09,413 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.02 vs. limit=6.386666666666667
2023-12-20 17:45:16,350 INFO [train.py:886] (1/4) Epoch 9, batch 0, loss[loss=0.05045, audio_tagging_loss=0.05045, over 24090.00 frames. ], tot_loss[loss=0.05045, audio_tagging_loss=0.05045, over 24090.00 frames. ], batch size: 100, lr: 2.61e-02, grad_scale: 4.0
2023-12-20 17:45:16,350 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:45:37,429 INFO [train.py:917] (1/4) Epoch 9, validation: loss=0.04977, audio_tagging_loss=0.04977, over 3737520.00 frames.
2023-12-20 17:45:37,429 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 17:45:37,785 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=90.75 vs. limit=9.58
2023-12-20 17:45:38,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=40.18 vs. limit=8.54
2023-12-20 17:45:53,202 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=86.22 vs. limit=9.629999999999999
2023-12-20 17:46:04,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2906.6666666666665, ans=0.091
2023-12-20 17:46:12,997 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=29.26 vs. limit=9.73
2023-12-20 17:46:15,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2973.3333333333335, ans=0.08849999999999998
2023-12-20 17:46:21,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=74.98 vs. limit=8.64
2023-12-20 17:46:24,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=3040.0, ans=0.076
2023-12-20 17:46:25,870 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=4.656e+00
2023-12-20 17:46:32,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3106.6666666666665, ans=0.08058333333333334
2023-12-20 17:46:33,268 INFO [train.py:886] (1/4) Epoch 9, batch 50, loss[loss=0.04912, audio_tagging_loss=0.04912, over 25000.00 frames. ], tot_loss[loss=0.04777, audio_tagging_loss=0.04777, over 1119621.40 frames. ], batch size: 100, lr: 2.75e-02, grad_scale: 4.0
2023-12-20 17:46:52,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.66 vs. limit=5.78
2023-12-20 17:46:59,486 INFO [train.py:886] (1/4) Epoch 10, batch 0, loss[loss=0.04918, audio_tagging_loss=0.04918, over 25000.00 frames. ], tot_loss[loss=0.04918, audio_tagging_loss=0.04918, over 25000.00 frames. ], batch size: 100, lr: 2.62e-02, grad_scale: 8.0
2023-12-20 17:46:59,487 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:47:20,694 INFO [train.py:917] (1/4) Epoch 10, validation: loss=0.04858, audio_tagging_loss=0.04858, over 3737520.00 frames.
2023-12-20 17:47:20,695 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 17:47:21,188 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=17.67 vs. limit=6.5600000000000005
2023-12-20 17:47:21,355 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=102.46 vs. limit=8.67
2023-12-20 17:47:31,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3186.6666666666665, ans=0.21813333333333335
2023-12-20 17:47:31,879 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.60 vs. limit=6.593333333333334
2023-12-20 17:47:36,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3186.6666666666665, ans=0.35062499999999996
2023-12-20 17:47:36,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3186.6666666666665, ans=0.35062499999999996
2023-12-20 17:47:47,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3253.3333333333335, ans=0.26746666666666663
2023-12-20 17:47:51,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.78 vs. limit=8.72
2023-12-20 17:47:53,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.99 vs. limit=5.328
2023-12-20 17:47:56,895 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=22.63 vs. limit=8.745000000000001
2023-12-20 17:48:00,363 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.79 vs. limit=5.83
2023-12-20 17:48:00,497 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=35.16 vs. limit=9.99
2023-12-20 17:48:01,451 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=59.47 vs. limit=8.745000000000001
2023-12-20 17:48:02,275 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.056e+00
2023-12-20 17:48:04,134 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.619e+01 3.726e+01 4.484e+01 5.424e+01 1.858e+02, threshold=8.969e+01, percent-clipped=3.0
2023-12-20 17:48:06,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.24 vs. limit=5.354666666666667
2023-12-20 17:48:06,939 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.02 vs. limit=10.04
2023-12-20 17:48:12,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=35.68 vs. limit=10.04
2023-12-20 17:48:15,991 INFO [train.py:886] (1/4) Epoch 10, batch 50, loss[loss=0.04728, audio_tagging_loss=0.04728, over 25000.00 frames. ], tot_loss[loss=0.04679, audio_tagging_loss=0.04679, over 1116564.21 frames. ], batch size: 100, lr: 2.71e-02, grad_scale: 8.0
2023-12-20 17:48:40,825 INFO [train.py:886] (1/4) Epoch 11, batch 0, loss[loss=0.05327, audio_tagging_loss=0.05327, over 21049.00 frames. ], tot_loss[loss=0.05327, audio_tagging_loss=0.05327, over 21049.00 frames. ], batch size: 106, lr: 2.58e-02, grad_scale: 16.0
2023-12-20 17:48:40,825 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:48:53,030 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.4948, 2.5191, 2.8846, 2.5371], device='cuda:1')
2023-12-20 17:49:01,997 INFO [train.py:917] (1/4) Epoch 11, validation: loss=0.04728, audio_tagging_loss=0.04728, over 3737520.00 frames.
2023-12-20 17:49:01,998 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 17:49:06,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3466.6666666666665, ans=7.166666666666666
2023-12-20 17:49:22,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3533.3333333333335, ans=0.334375
2023-12-20 17:49:23,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3533.3333333333335, ans=0.334375
2023-12-20 17:49:25,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=3600.0, ans=10.2
2023-12-20 17:49:26,241 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=35.67 vs. limit=8.85
2023-12-20 17:49:26,385 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=16.88 vs. limit=8.85
2023-12-20 17:49:30,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3600.0, ans=0.06499999999999997
2023-12-20 17:49:30,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.27 vs. limit=8.85
2023-12-20 17:49:31,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3600.0, ans=0.33125
2023-12-20 17:49:32,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3600.0, ans=0.33125
2023-12-20 17:49:36,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3666.6666666666665, ans=0.328125
2023-12-20 17:49:38,217 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=43.03 vs. limit=8.875
2023-12-20 17:49:38,530 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.55 vs. limit=10.25
2023-12-20 17:49:51,441 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=28.44 vs. limit=6.866666666666667
2023-12-20 17:49:53,292 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.51 vs. limit=10.3
2023-12-20 17:49:56,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3733.3333333333335, ans=0.07
2023-12-20 17:49:58,310 INFO [train.py:886] (1/4) Epoch 11, batch 50, loss[loss=0.04305, audio_tagging_loss=0.04305, over 25000.00 frames. ], tot_loss[loss=0.04557, audio_tagging_loss=0.04557, over 1113738.85 frames. ], batch size: 100, lr: 2.58e-02, grad_scale: 16.0
2023-12-20 17:50:23,184 INFO [train.py:886] (1/4) Epoch 12, batch 0, loss[loss=0.04627, audio_tagging_loss=0.04627, over 25000.00 frames. ], tot_loss[loss=0.04627, audio_tagging_loss=0.04627, over 25000.00 frames. ], batch size: 100, lr: 2.47e-02, grad_scale: 32.0
2023-12-20 17:50:23,184 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:50:36,844 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([1.3127, 1.5376, 1.3518, 1.2430], device='cuda:1')
2023-12-20 17:50:44,480 INFO [train.py:917] (1/4) Epoch 12, validation: loss=0.04619, audio_tagging_loss=0.04619, over 3737520.00 frames.
2023-12-20 17:50:44,480 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 17:50:47,967 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=26.68 vs. limit=8.93
2023-12-20 17:50:48,087 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=15.31 vs. limit=8.93
2023-12-20 17:50:52,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.07 vs. limit=8.93
2023-12-20 17:50:58,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3880.0, ans=0.318125
2023-12-20 17:50:59,331 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.03 vs. limit=8.955
2023-12-20 17:51:07,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.49 vs. limit=6.973333333333333
2023-12-20 17:51:12,494 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.09 vs. limit=10.46
2023-12-20 17:51:14,746 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=8.98
2023-12-20 17:51:24,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4013.3333333333335, ans=0.311875
2023-12-20 17:51:24,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=35.79 vs. limit=9.004999999999999
2023-12-20 17:51:25,063 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.695e+01 3.849e+01 4.841e+01 5.572e+01 8.770e+01, threshold=9.682e+01, percent-clipped=0.0
2023-12-20 17:51:26,512 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=27.31 vs. limit=10.51
2023-12-20 17:51:29,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.53 vs. limit=7.04
2023-12-20 17:51:29,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.23 vs. limit=7.04
2023-12-20 17:51:35,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4080.0, ans=0.04966666666666667
2023-12-20 17:51:39,162 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.97 vs. limit=5.632
2023-12-20 17:51:40,930 INFO [train.py:886] (1/4) Epoch 12, batch 50, loss[loss=0.04324, audio_tagging_loss=0.04324, over 25000.00 frames. ], tot_loss[loss=0.04382, audio_tagging_loss=0.04382, over 1123444.17 frames. ], batch size: 100, lr: 2.47e-02, grad_scale: 32.0
2023-12-20 17:52:04,705 INFO [train.py:886] (1/4) Epoch 13, batch 0, loss[loss=0.04383, audio_tagging_loss=0.04383, over 24057.00 frames. ], tot_loss[loss=0.04383, audio_tagging_loss=0.04383, over 24057.00 frames. ], batch size: 100, lr: 2.38e-02, grad_scale: 32.0
2023-12-20 17:52:04,706 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:52:12,860 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.6706, 1.7295, 1.9306, 1.5983], device='cuda:1')
2023-12-20 17:52:25,609 INFO [train.py:917] (1/4) Epoch 13, validation: loss=0.04525, audio_tagging_loss=0.04525, over 3737520.00 frames.
2023-12-20 17:52:25,610 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 17:52:25,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4160.0, ans=0.04933333333333333
2023-12-20 17:52:36,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4226.666666666667, ans=0.2577333333333333
2023-12-20 17:52:40,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.20 vs. limit=9.085
2023-12-20 17:52:52,086 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=26.59 vs. limit=10.719999999999999
2023-12-20 17:52:52,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.53 vs. limit=10.719999999999999
2023-12-20 17:52:58,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4360.0, ans=0.295625
2023-12-20 17:52:58,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=48.43 vs. limit=9.135
2023-12-20 17:53:00,497 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.133e+01
2023-12-20 17:53:01,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=46.56 vs. limit=9.135
2023-12-20 17:53:04,120 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.16 vs. limit=9.135
2023-12-20 17:53:14,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.78 vs. limit=6.1066666666666665
2023-12-20 17:53:15,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.36 vs. limit=7.213333333333333
2023-12-20 17:53:19,087 INFO [train.py:886] (1/4) Epoch 13, batch 50, loss[loss=0.04217, audio_tagging_loss=0.04217, over 25000.00 frames. ], tot_loss[loss=0.04317, audio_tagging_loss=0.04317, over 1118034.72 frames. ], batch size: 100, lr: 2.38e-02, grad_scale: 32.0
2023-12-20 17:53:43,851 INFO [train.py:886] (1/4) Epoch 14, batch 0, loss[loss=0.04432, audio_tagging_loss=0.04432, over 24126.00 frames. ], tot_loss[loss=0.04432, audio_tagging_loss=0.04432, over 24126.00 frames. ], batch size: 100, lr: 2.29e-02, grad_scale: 32.0
2023-12-20 17:53:43,852 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:53:54,752 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.5347, 2.3677, 2.5217, 2.6816], device='cuda:1')
2023-12-20 17:54:05,169 INFO [train.py:917] (1/4) Epoch 14, validation: loss=0.04503, audio_tagging_loss=0.04503, over 3737520.00 frames.
2023-12-20 17:54:05,170 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 17:54:10,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.86 vs. limit=9.19
2023-12-20 17:54:35,814 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.59 vs. limit=10.98
2023-12-20 17:54:37,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4706.666666666667, ans=0.27937500000000004
2023-12-20 17:54:37,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.91 vs. limit=9.265
2023-12-20 17:54:37,831 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.33 vs. limit=9.265
2023-12-20 17:54:38,363 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.429e+01 4.195e+01 5.214e+01 6.348e+01 1.962e+02, threshold=1.043e+02, percent-clipped=5.0
2023-12-20 17:54:40,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.05 vs. limit=9.265
2023-12-20 17:54:57,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4840.0, ans=0.273125
2023-12-20 17:54:58,024 INFO [train.py:886] (1/4) Epoch 14, batch 50, loss[loss=0.04078, audio_tagging_loss=0.04078, over 25000.00 frames. ], tot_loss[loss=0.04263, audio_tagging_loss=0.04263, over 1119562.76 frames. ], batch size: 100, lr: 2.29e-02, grad_scale: 32.0
2023-12-20 17:55:22,493 INFO [train.py:886] (1/4) Epoch 15, batch 0, loss[loss=0.04204, audio_tagging_loss=0.04204, over 25000.00 frames. ], tot_loss[loss=0.04204, audio_tagging_loss=0.04204, over 25000.00 frames. ], batch size: 100, lr: 2.21e-02, grad_scale: 32.0
2023-12-20 17:55:22,494 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:55:43,373 INFO [train.py:917] (1/4) Epoch 15, validation: loss=0.04452, audio_tagging_loss=0.04452, over 3737520.00 frames.
2023-12-20 17:55:43,373 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 17:55:44,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=4853.333333333333, ans=0.20146666666666668
2023-12-20 17:55:44,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=4853.333333333333, ans=9.32
2023-12-20 17:55:47,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=34.74 vs. limit=9.32
2023-12-20 17:55:49,874 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=25.22 vs. limit=9.32
2023-12-20 17:55:56,125 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=9.345
2023-12-20 17:56:01,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.77 vs. limit=11.19
2023-12-20 17:56:11,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.36 vs. limit=6.246666666666667
2023-12-20 17:56:17,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5053.333333333333, ans=0.263125
2023-12-20 17:56:17,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.22 vs. limit=9.395
2023-12-20 17:56:23,758 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.53 vs. limit=9.395
2023-12-20 17:56:27,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.73 vs. limit=9.42
2023-12-20 17:56:31,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=5120.0, ans=0.09899494936611666
2023-12-20 17:56:35,411 INFO [train.py:886] (1/4) Epoch 15, batch 50, loss[loss=0.0422, audio_tagging_loss=0.0422, over 25000.00 frames. ], tot_loss[loss=0.04144, audio_tagging_loss=0.04144, over 1124893.53 frames. ], batch size: 100, lr: 2.21e-02, grad_scale: 32.0
2023-12-20 17:57:00,247 INFO [train.py:886] (1/4) Epoch 16, batch 0, loss[loss=0.04654, audio_tagging_loss=0.04654, over 22112.00 frames. ], tot_loss[loss=0.04654, audio_tagging_loss=0.04654, over 22112.00 frames. ], batch size: 106, lr: 2.14e-02, grad_scale: 32.0
2023-12-20 17:57:00,248 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:57:13,238 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.9660, 1.7526, 2.0088, 1.8561], device='cuda:1')
2023-12-20 17:57:21,259 INFO [train.py:917] (1/4) Epoch 16, validation: loss=0.04383, audio_tagging_loss=0.04383, over 3737520.00 frames.
2023-12-20 17:57:21,259 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 17:57:26,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5200.0, ans=0.25625
2023-12-20 17:57:27,474 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.08 vs. limit=11.4
2023-12-20 17:57:28,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=5200.0, ans=0.278
2023-12-20 17:57:29,368 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=53.44 vs. limit=9.45
2023-12-20 17:57:33,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.84 vs. limit=9.475
2023-12-20 17:57:38,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.68 vs. limit=11.45
2023-12-20 17:57:42,137 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.31 vs. limit=11.5
2023-12-20 17:57:43,088 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=9.5
2023-12-20 17:57:47,412 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=11.5
2023-12-20 17:57:49,876 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.797e+01 3.933e+01 4.813e+01 5.766e+01 2.623e+02, threshold=9.626e+01, percent-clipped=4.0
2023-12-20 17:57:51,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.04 vs. limit=9.5
2023-12-20 17:58:06,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5466.666666666667, ans=0.24375000000000002
2023-12-20 17:58:12,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=5466.666666666667, ans=0.24375000000000002
2023-12-20 17:58:14,014 INFO [train.py:886] (1/4) Epoch 16, batch 50, loss[loss=0.03823, audio_tagging_loss=0.03823, over 25000.00 frames. ], tot_loss[loss=0.04063, audio_tagging_loss=0.04063, over 1120529.57 frames. ], batch size: 100, lr: 2.14e-02, grad_scale: 32.0
2023-12-20 17:58:38,071 INFO [train.py:886] (1/4) Epoch 17, batch 0, loss[loss=0.0434, audio_tagging_loss=0.0434, over 24114.00 frames. ], tot_loss[loss=0.0434, audio_tagging_loss=0.0434, over 24114.00 frames. ], batch size: 100, lr: 2.07e-02, grad_scale: 32.0
2023-12-20 17:58:38,072 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 17:58:46,293 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([1.8670, 1.1711, 1.7423, 1.7265], device='cuda:1')
2023-12-20 17:58:59,165 INFO [train.py:917] (1/4) Epoch 17, validation: loss=0.04362, audio_tagging_loss=0.04362, over 3737520.00 frames.
2023-12-20 17:58:59,166 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 17:59:10,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5613.333333333333, ans=0.236875
2023-12-20 17:59:12,955 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=11.71
2023-12-20 17:59:17,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5613.333333333333, ans=0.0
2023-12-20 17:59:17,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=5613.333333333333, ans=0.8061333333333334
2023-12-20 17:59:18,905 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.49 vs. limit=11.76
2023-12-20 17:59:35,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5746.666666666667, ans=0.0
2023-12-20 17:59:49,922 INFO [train.py:886] (1/4) Epoch 17, batch 50, loss[loss=0.03733, audio_tagging_loss=0.03733, over 25000.00 frames. ], tot_loss[loss=0.0399, audio_tagging_loss=0.0399, over 1120885.05 frames. ], batch size: 100, lr: 2.07e-02, grad_scale: 32.0
2023-12-20 18:00:14,303 INFO [train.py:886] (1/4) Epoch 18, batch 0, loss[loss=0.04102, audio_tagging_loss=0.04102, over 24118.00 frames. ], tot_loss[loss=0.04102, audio_tagging_loss=0.04102, over 24118.00 frames. ], batch size: 100, lr: 2.01e-02, grad_scale: 32.0
2023-12-20 18:00:14,303 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:00:35,062 INFO [train.py:917] (1/4) Epoch 18, validation: loss=0.04342, audio_tagging_loss=0.04342, over 3737520.00 frames.
2023-12-20 18:00:35,063 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:00:45,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5960.0, ans=0.2404
2023-12-20 18:00:49,186 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.65 vs. limit=9.735
2023-12-20 18:00:53,272 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.76 vs. limit=7.98
2023-12-20 18:00:58,723 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.069e+01 3.667e+01 4.319e+01 5.687e+01 1.553e+02, threshold=8.639e+01, percent-clipped=3.0
2023-12-20 18:01:00,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=6026.666666666667, ans=0.23973333333333333
2023-12-20 18:01:01,367 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=12.02
2023-12-20 18:01:08,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=6093.333333333333, ans=0.23906666666666665
2023-12-20 18:01:18,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=6160.0, ans=0.21125
2023-12-20 18:01:18,633 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.66 vs. limit=9.81
2023-12-20 18:01:21,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=6160.0, ans=0.041
2023-12-20 18:01:25,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.69 vs. limit=6.490666666666667
2023-12-20 18:01:25,748 INFO [train.py:886] (1/4) Epoch 18, batch 50, loss[loss=0.03687, audio_tagging_loss=0.03687, over 25000.00 frames. ], tot_loss[loss=0.03919, audio_tagging_loss=0.03919, over 1123284.48 frames. ], batch size: 100, lr: 2.01e-02, grad_scale: 32.0
2023-12-20 18:01:50,820 INFO [train.py:886] (1/4) Epoch 19, batch 0, loss[loss=0.05174, audio_tagging_loss=0.05174, over 20735.00 frames. ], tot_loss[loss=0.05174, audio_tagging_loss=0.05174, over 20735.00 frames. ], batch size: 106, lr: 1.96e-02, grad_scale: 32.0
2023-12-20 18:01:50,821 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:02:11,830 INFO [train.py:917] (1/4) Epoch 19, validation: loss=0.04287, audio_tagging_loss=0.04287, over 3737520.00 frames.
2023-12-20 18:02:11,831 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:02:12,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=6240.0, ans=0.20750000000000002
2023-12-20 18:02:28,693 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=22.02 vs. limit=9.865
2023-12-20 18:02:29,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=4.82 vs. limit=5.261333333333333
2023-12-20 18:02:41,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.24 vs. limit=9.915
2023-12-20 18:02:58,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=6506.666666666667, ans=0.195
2023-12-20 18:03:02,034 INFO [train.py:886] (1/4) Epoch 19, batch 50, loss[loss=0.0379, audio_tagging_loss=0.0379, over 25000.00 frames. ], tot_loss[loss=0.03867, audio_tagging_loss=0.03867, over 1113120.79 frames. ], batch size: 100, lr: 1.96e-02, grad_scale: 32.0
2023-12-20 18:03:26,286 INFO [train.py:886] (1/4) Epoch 20, batch 0, loss[loss=0.04763, audio_tagging_loss=0.04763, over 20688.00 frames. ], tot_loss[loss=0.04763, audio_tagging_loss=0.04763, over 20688.00 frames. ], batch size: 106, lr: 1.91e-02, grad_scale: 32.0
2023-12-20 18:03:26,287 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:03:47,096 INFO [train.py:917] (1/4) Epoch 20, validation: loss=0.0429, audio_tagging_loss=0.0429, over 3737520.00 frames.
2023-12-20 18:03:47,097 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:03:59,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=6653.333333333333, ans=0.188125
2023-12-20 18:04:03,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.63 vs. limit=12.49
2023-12-20 18:04:04,133 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.75 vs. limit=9.995000000000001
2023-12-20 18:04:06,505 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.851e+01 3.799e+01 4.551e+01 5.624e+01 1.513e+02, threshold=9.102e+01, percent-clipped=5.0
2023-12-20 18:04:19,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=6786.666666666667, ans=0.181875
2023-12-20 18:04:23,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=6786.666666666667, ans=0.6624666666666666
2023-12-20 18:04:28,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=6853.333333333333, ans=0.17875000000000002
2023-12-20 18:04:31,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=6853.333333333333, ans=0.6601333333333333
2023-12-20 18:04:34,812 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.12 vs. limit=10.07
2023-12-20 18:04:37,015 INFO [train.py:886] (1/4) Epoch 20, batch 50, loss[loss=0.03946, audio_tagging_loss=0.03946, over 25000.00 frames. ], tot_loss[loss=0.03792, audio_tagging_loss=0.03792, over 1119689.44 frames. ], batch size: 100, lr: 1.91e-02, grad_scale: 32.0
2023-12-20 18:04:59,861 INFO [train.py:886] (1/4) Epoch 21, batch 0, loss[loss=0.03549, audio_tagging_loss=0.03549, over 25000.00 frames. ], tot_loss[loss=0.03549, audio_tagging_loss=0.03549, over 25000.00 frames. ], batch size: 100, lr: 1.86e-02, grad_scale: 32.0
2023-12-20 18:04:59,862 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:05:20,819 INFO [train.py:917] (1/4) Epoch 21, validation: loss=0.0427, audio_tagging_loss=0.0427, over 3737520.00 frames.
2023-12-20 18:05:20,819 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:05:31,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=7000.0, ans=0.0
2023-12-20 18:05:35,939 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=29.45 vs. limit=12.75
2023-12-20 18:05:36,009 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.63 vs. limit=10.125
2023-12-20 18:05:50,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=7066.666666666667, ans=9.416666666666668
2023-12-20 18:06:10,335 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.10 vs. limit=12.95
2023-12-20 18:06:10,787 INFO [train.py:886] (1/4) Epoch 21, batch 50, loss[loss=0.03565, audio_tagging_loss=0.03565, over 25000.00 frames. ], tot_loss[loss=0.03737, audio_tagging_loss=0.03737, over 1116419.59 frames. ], batch size: 100, lr: 1.86e-02, grad_scale: 32.0
2023-12-20 18:06:34,955 INFO [train.py:886] (1/4) Epoch 22, batch 0, loss[loss=0.04177, audio_tagging_loss=0.04177, over 24182.00 frames. ], tot_loss[loss=0.04177, audio_tagging_loss=0.04177, over 24182.00 frames. ], batch size: 100, lr: 1.82e-02, grad_scale: 32.0
2023-12-20 18:06:34,956 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:06:48,027 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.1941, 2.0806, 2.3632, 2.1985], device='cuda:1')
2023-12-20 18:06:55,957 INFO [train.py:917] (1/4) Epoch 22, validation: loss=0.04259, audio_tagging_loss=0.04259, over 3737520.00 frames.
2023-12-20 18:06:55,958 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:06:58,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=7280.0, ans=0.036333333333333336
2023-12-20 18:06:59,894 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.35 vs. limit=12.96
2023-12-20 18:07:01,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=7280.0, ans=0.15875
2023-12-20 18:07:09,509 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.10 vs. limit=10.254999999999999
2023-12-20 18:07:10,812 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.833e+01 3.757e+01 4.513e+01 5.428e+01 2.125e+02, threshold=9.026e+01, percent-clipped=5.0
2023-12-20 18:07:31,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=7480.0, ans=0.0
2023-12-20 18:07:31,752 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.16 vs. limit=10.305
2023-12-20 18:07:37,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=7546.666666666667, ans=0.035222222222222224
2023-12-20 18:07:44,532 INFO [train.py:886] (1/4) Epoch 22, batch 50, loss[loss=0.03513, audio_tagging_loss=0.03513, over 25000.00 frames. ], tot_loss[loss=0.03668, audio_tagging_loss=0.03668, over 1115831.91 frames. ], batch size: 100, lr: 1.81e-02, grad_scale: 32.0
2023-12-20 18:07:44,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.49 vs. limit=7.045333333333334
2023-12-20 18:08:02,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=13.219999999999999
2023-12-20 18:08:08,666 INFO [train.py:886] (1/4) Epoch 23, batch 0, loss[loss=0.0415, audio_tagging_loss=0.0415, over 21567.00 frames. ], tot_loss[loss=0.0415, audio_tagging_loss=0.0415, over 21567.00 frames. ], batch size: 106, lr: 1.77e-02, grad_scale: 32.0
2023-12-20 18:08:08,666 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:08:30,060 INFO [train.py:917] (1/4) Epoch 23, validation: loss=0.04291, audio_tagging_loss=0.04291, over 3737520.00 frames.
2023-12-20 18:08:30,061 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:08:31,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=7626.666666666667, ans=10.36
2023-12-20 18:08:32,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=7626.666666666667, ans=0.14250000000000002
2023-12-20 18:08:46,661 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.40 vs. limit=13.27
2023-12-20 18:08:47,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.52 vs. limit=10.385
2023-12-20 18:08:57,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=7760.0, ans=0.13624999999999998
2023-12-20 18:08:57,850 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.79 vs. limit=10.41
2023-12-20 18:09:17,953 INFO [train.py:886] (1/4) Epoch 23, batch 50, loss[loss=0.03463, audio_tagging_loss=0.03463, over 25000.00 frames. ], tot_loss[loss=0.03506, audio_tagging_loss=0.03506, over 1123814.02 frames. ], batch size: 100, lr: 1.77e-02, grad_scale: 32.0
2023-12-20 18:09:40,309 INFO [train.py:886] (1/4) Epoch 24, batch 0, loss[loss=0.04475, audio_tagging_loss=0.04475, over 21395.00 frames. ], tot_loss[loss=0.04475, audio_tagging_loss=0.04475, over 21395.00 frames. ], batch size: 106, lr: 1.73e-02, grad_scale: 32.0
2023-12-20 18:09:40,310 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:10:01,279 INFO [train.py:917] (1/4) Epoch 24, validation: loss=0.04248, audio_tagging_loss=0.04248, over 3737520.00 frames.
2023-12-20 18:10:01,280 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:10:04,232 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=1.052e+01
2023-12-20 18:10:06,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=7973.333333333333, ans=0.02
2023-12-20 18:10:12,545 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 2.742e+01 3.651e+01 4.128e+01 4.777e+01 1.617e+02, threshold=8.255e+01, percent-clipped=1.0
2023-12-20 18:10:15,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=8040.0, ans=0.6186
2023-12-20 18:10:17,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=8040.0, ans=0.125
2023-12-20 18:10:17,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=8040.0, ans=0.125
2023-12-20 18:10:36,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=8173.333333333333, ans=0.125
2023-12-20 18:10:49,626 INFO [train.py:886] (1/4) Epoch 24, batch 50, loss[loss=0.03358, audio_tagging_loss=0.03358, over 25000.00 frames. ], tot_loss[loss=0.03468, audio_tagging_loss=0.03468, over 1116019.87 frames. ], batch size: 100, lr: 1.73e-02, grad_scale: 32.0
2023-12-20 18:11:13,606 INFO [train.py:886] (1/4) Epoch 25, batch 0, loss[loss=0.03754, audio_tagging_loss=0.03754, over 24081.00 frames. ], tot_loss[loss=0.03754, audio_tagging_loss=0.03754, over 24081.00 frames. ], batch size: 100, lr: 1.70e-02, grad_scale: 32.0
2023-12-20 18:11:13,607 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:11:34,708 INFO [train.py:917] (1/4) Epoch 25, validation: loss=0.04257, audio_tagging_loss=0.04257, over 3737520.00 frames.
2023-12-20 18:11:34,708 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:11:57,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.23 vs. limit=10.67
2023-12-20 18:12:02,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=8520.0, ans=0.125
2023-12-20 18:12:10,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8520.0, ans=0.2148
2023-12-20 18:12:10,788 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.10 vs. limit=7.4079999999999995
2023-12-20 18:12:11,720 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.32 vs. limit=9.26
2023-12-20 18:12:14,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8586.666666666666, ans=0.21413333333333334
2023-12-20 18:12:22,263 INFO [train.py:886] (1/4) Epoch 25, batch 50, loss[loss=0.0314, audio_tagging_loss=0.0314, over 25000.00 frames. ], tot_loss[loss=0.03346, audio_tagging_loss=0.03346, over 1117704.48 frames. ], batch size: 100, lr: 1.70e-02, grad_scale: 32.0
2023-12-20 18:12:45,038 INFO [train.py:886] (1/4) Epoch 26, batch 0, loss[loss=0.04567, audio_tagging_loss=0.04567, over 21448.00 frames. ], tot_loss[loss=0.04567, audio_tagging_loss=0.04567, over 21448.00 frames. ], batch size: 106, lr: 1.66e-02, grad_scale: 32.0
2023-12-20 18:12:45,039 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:13:05,885 INFO [train.py:917] (1/4) Epoch 26, validation: loss=0.04241, audio_tagging_loss=0.04241, over 3737520.00 frames.
2023-12-20 18:13:05,886 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:13:12,409 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.047e+01 3.673e+01 4.044e+01 4.675e+01 8.607e+01, threshold=8.088e+01, percent-clipped=1.0
2023-12-20 18:13:24,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=8800.0, ans=0.212
2023-12-20 18:13:25,251 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.20 vs. limit=7.52
2023-12-20 18:13:46,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=8933.333333333334, ans=0.5873333333333334
2023-12-20 18:13:53,005 INFO [train.py:886] (1/4) Epoch 26, batch 50, loss[loss=0.03165, audio_tagging_loss=0.03165, over 25000.00 frames. ], tot_loss[loss=0.03303, audio_tagging_loss=0.03303, over 1118245.57 frames. ], batch size: 100, lr: 1.66e-02, grad_scale: 32.0
2023-12-20 18:14:18,298 INFO [train.py:886] (1/4) Epoch 27, batch 0, loss[loss=0.04504, audio_tagging_loss=0.04504, over 20685.00 frames. ], tot_loss[loss=0.04504, audio_tagging_loss=0.04504, over 20685.00 frames. ], batch size: 106, lr: 1.63e-02, grad_scale: 32.0
2023-12-20 18:14:18,299 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:14:31,255 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1727, 4.1085, 4.0626, 3.8466], device='cuda:1')
2023-12-20 18:14:37,832 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.3524, 2.1953, 2.3511, 2.2864], device='cuda:1')
2023-12-20 18:14:39,329 INFO [train.py:917] (1/4) Epoch 27, validation: loss=0.04294, audio_tagging_loss=0.04294, over 3737520.00 frames.
2023-12-20 18:14:39,330 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:14:40,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=9013.333333333334, ans=0.125
2023-12-20 18:14:54,626 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=25.74 vs. limit=10.905
2023-12-20 18:15:11,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=9213.333333333334, ans=0.125
2023-12-20 18:15:11,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=9213.333333333334, ans=0.0
2023-12-20 18:15:14,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=9213.333333333334, ans=0.125
2023-12-20 18:15:19,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=9280.0, ans=0.2072
2023-12-20 18:15:19,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.13 vs. limit=14.46
2023-12-20 18:15:26,760 INFO [train.py:886] (1/4) Epoch 27, batch 50, loss[loss=0.03001, audio_tagging_loss=0.03001, over 25000.00 frames. ], tot_loss[loss=0.03177, audio_tagging_loss=0.03177, over 1123725.78 frames. ], batch size: 100, lr: 1.63e-02, grad_scale: 32.0
2023-12-20 18:15:48,259 INFO [train.py:886] (1/4) Epoch 28, batch 0, loss[loss=0.03468, audio_tagging_loss=0.03468, over 24092.00 frames. ], tot_loss[loss=0.03468, audio_tagging_loss=0.03468, over 24092.00 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0
2023-12-20 18:15:48,259 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:16:09,708 INFO [train.py:917] (1/4) Epoch 28, validation: loss=0.04282, audio_tagging_loss=0.04282, over 3737520.00 frames.
2023-12-20 18:16:09,709 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:16:12,513 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.131e+01 3.970e+01 4.630e+01 5.343e+01 9.281e+01, threshold=9.260e+01, percent-clipped=1.0
2023-12-20 18:16:25,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=9426.666666666666, ans=0.027388888888888893
2023-12-20 18:16:28,762 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.23 vs. limit=11.06
2023-12-20 18:16:33,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=9493.333333333334, ans=0.008805797101449275
2023-12-20 18:16:39,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=9560.0, ans=0.5654
2023-12-20 18:16:56,949 INFO [train.py:886] (1/4) Epoch 28, batch 50, loss[loss=0.02919, audio_tagging_loss=0.02919, over 25000.00 frames. ], tot_loss[loss=0.03098, audio_tagging_loss=0.03098, over 1121594.30 frames. ], batch size: 100, lr: 1.60e-02, grad_scale: 32.0
2023-12-20 18:17:19,797 INFO [train.py:886] (1/4) Epoch 29, batch 0, loss[loss=0.03225, audio_tagging_loss=0.03225, over 25000.00 frames. ], tot_loss[loss=0.03225, audio_tagging_loss=0.03225, over 25000.00 frames. ], batch size: 100, lr: 1.57e-02, grad_scale: 32.0
2023-12-20 18:17:19,797 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:17:30,806 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.7690, 2.3818, 2.4859, 2.7226], device='cuda:1')
2023-12-20 18:17:32,138 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([1.8887, 1.7890, 1.7039, 1.8671, 1.7203, 1.8254, 1.5409, 1.6900],
device='cuda:1')
2023-12-20 18:17:40,754 INFO [train.py:917] (1/4) Epoch 29, validation: loss=0.04276, audio_tagging_loss=0.04276, over 3737520.00 frames.
2023-12-20 18:17:40,755 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:17:40,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=9706.666666666666, ans=0.125
2023-12-20 18:18:10,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=9906.666666666666, ans=0.125
2023-12-20 18:18:16,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=11.215
2023-12-20 18:18:29,133 INFO [train.py:886] (1/4) Epoch 29, batch 50, loss[loss=0.02827, audio_tagging_loss=0.02827, over 25000.00 frames. ], tot_loss[loss=0.02979, audio_tagging_loss=0.02979, over 1127124.95 frames. ], batch size: 100, lr: 1.57e-02, grad_scale: 32.0
2023-12-20 18:18:30,003 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.044e+01 4.177e+01 4.600e+01 5.564e+01 7.757e+01, threshold=9.200e+01, percent-clipped=0.0
2023-12-20 18:18:51,717 INFO [train.py:886] (1/4) Epoch 30, batch 0, loss[loss=0.03938, audio_tagging_loss=0.03938, over 20030.00 frames. ], tot_loss[loss=0.03938, audio_tagging_loss=0.03938, over 20030.00 frames. ], batch size: 106, lr: 1.54e-02, grad_scale: 32.0
2023-12-20 18:18:51,718 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:19:02,612 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.1611, 1.9390, 1.9798, 1.3439, 1.8295, 1.8108, 1.7752, 1.8664],
device='cuda:1')
2023-12-20 18:19:04,668 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.1123, 1.9960, 1.9897, 1.6441, 1.9536, 1.8918, 1.8109, 1.8462],
device='cuda:1')
2023-12-20 18:19:12,597 INFO [train.py:917] (1/4) Epoch 30, validation: loss=0.04346, audio_tagging_loss=0.04346, over 3737520.00 frames.
2023-12-20 18:19:12,597 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:19:29,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.24 vs. limit=10.059999999999999
2023-12-20 18:19:30,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=10120.0, ans=0.125
2023-12-20 18:19:45,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=10253.333333333334, ans=0.5411333333333334
2023-12-20 18:19:59,939 INFO [train.py:886] (1/4) Epoch 30, batch 50, loss[loss=0.02916, audio_tagging_loss=0.02916, over 25000.00 frames. ], tot_loss[loss=0.02901, audio_tagging_loss=0.02901, over 1119414.10 frames. ], batch size: 100, lr: 1.54e-02, grad_scale: 32.0
2023-12-20 18:20:00,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=10386.666666666666, ans=0.125
2023-12-20 18:20:22,372 INFO [train.py:886] (1/4) Epoch 31, batch 0, loss[loss=0.03037, audio_tagging_loss=0.03037, over 24148.00 frames. ], tot_loss[loss=0.03037, audio_tagging_loss=0.03037, over 24148.00 frames. ], batch size: 100, lr: 1.52e-02, grad_scale: 32.0
2023-12-20 18:20:22,372 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:20:32,884 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.0702, 1.7091, 1.7310, 1.8776, 1.7862, 1.8434, 1.5666, 1.7082],
device='cuda:1')
2023-12-20 18:20:33,514 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.3741, 2.3594, 2.5183, 2.3064], device='cuda:1')
2023-12-20 18:20:43,507 INFO [train.py:917] (1/4) Epoch 31, validation: loss=0.04363, audio_tagging_loss=0.04363, over 3737520.00 frames.
2023-12-20 18:20:43,508 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:20:46,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=10400.0, ans=0.023333333333333334
2023-12-20 18:20:51,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=10400.0, ans=0.125
2023-12-20 18:21:07,696 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.60 vs. limit=15.4
2023-12-20 18:21:14,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=10600.0, ans=0.022500000000000003
2023-12-20 18:21:29,125 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.385e+01 4.278e+01 4.904e+01 5.799e+01 1.168e+02, threshold=9.808e+01, percent-clipped=2.0
2023-12-20 18:21:30,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=10666.666666666666, ans=0.0
2023-12-20 18:21:31,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.07 vs. limit=10.366666666666667
2023-12-20 18:21:31,820 INFO [train.py:886] (1/4) Epoch 31, batch 50, loss[loss=0.02962, audio_tagging_loss=0.02962, over 25000.00 frames. ], tot_loss[loss=0.0283, audio_tagging_loss=0.0283, over 1126555.81 frames. ], batch size: 100, lr: 1.51e-02, grad_scale: 32.0
2023-12-20 18:21:54,502 INFO [train.py:886] (1/4) Epoch 32, batch 0, loss[loss=0.03147, audio_tagging_loss=0.03147, over 24121.00 frames. ], tot_loss[loss=0.03147, audio_tagging_loss=0.03147, over 24121.00 frames. ], batch size: 100, lr: 1.49e-02, grad_scale: 32.0
2023-12-20 18:21:54,503 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:22:14,149 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.2022, 1.9398, 1.8179, 1.8579], device='cuda:1')
2023-12-20 18:22:15,980 INFO [train.py:917] (1/4) Epoch 32, validation: loss=0.04494, audio_tagging_loss=0.04494, over 3737520.00 frames.
2023-12-20 18:22:15,980 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:22:21,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=10746.666666666666, ans=0.19253333333333333
2023-12-20 18:22:25,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=10813.333333333334, ans=0.02161111111111111
2023-12-20 18:22:27,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=10813.333333333334, ans=0.125
2023-12-20 18:22:35,786 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=18.20 vs. limit=11.58
2023-12-20 18:22:38,985 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.51 vs. limit=15.66
2023-12-20 18:22:40,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=10880.0, ans=0.19119999999999998
2023-12-20 18:22:40,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=10880.0, ans=0.021333333333333336
2023-12-20 18:22:44,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=10946.666666666666, ans=0.125
2023-12-20 18:22:47,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=10946.666666666666, ans=0.36419999999999997
2023-12-20 18:22:51,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=10946.666666666666, ans=0.19053333333333333
2023-12-20 18:22:58,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=11013.333333333334, ans=0.125
2023-12-20 18:22:58,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=11.629999999999999
2023-12-20 18:22:59,696 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.16 vs. limit=15.76
2023-12-20 18:23:02,777 INFO [train.py:886] (1/4) Epoch 32, batch 50, loss[loss=0.02606, audio_tagging_loss=0.02606, over 25000.00 frames. ], tot_loss[loss=0.02763, audio_tagging_loss=0.02763, over 1114963.93 frames. ], batch size: 100, lr: 1.49e-02, grad_scale: 32.0
2023-12-20 18:23:22,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11093.333333333334, ans=0.18906666666666666
2023-12-20 18:23:25,184 INFO [train.py:886] (1/4) Epoch 33, batch 0, loss[loss=0.03231, audio_tagging_loss=0.03231, over 21830.00 frames. ], tot_loss[loss=0.03231, audio_tagging_loss=0.03231, over 21830.00 frames. ], batch size: 106, lr: 1.47e-02, grad_scale: 32.0
2023-12-20 18:23:25,185 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:23:46,126 INFO [train.py:917] (1/4) Epoch 33, validation: loss=0.0459, audio_tagging_loss=0.0459, over 3737520.00 frames.
2023-12-20 18:23:46,126 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:23:48,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=11093.333333333334, ans=0.125
2023-12-20 18:23:59,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=11.684999999999999
2023-12-20 18:24:09,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=11226.666666666666, ans=0.18773333333333334
2023-12-20 18:24:10,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.31 vs. limit=11.71
2023-12-20 18:24:20,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=11293.333333333334, ans=0.019611111111111107
2023-12-20 18:24:25,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=11360.0, ans=0.125
2023-12-20 18:24:26,656 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.339e+01 4.449e+01 5.027e+01 5.967e+01 1.050e+02, threshold=1.005e+02, percent-clipped=1.0
2023-12-20 18:24:29,095 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=4.704
2023-12-20 18:24:33,020 INFO [train.py:886] (1/4) Epoch 33, batch 50, loss[loss=0.02471, audio_tagging_loss=0.02471, over 25000.00 frames. ], tot_loss[loss=0.02602, audio_tagging_loss=0.02602, over 1125427.78 frames. ], batch size: 100, lr: 1.47e-02, grad_scale: 32.0
2023-12-20 18:24:54,840 INFO [train.py:886] (1/4) Epoch 34, batch 0, loss[loss=0.02736, audio_tagging_loss=0.02736, over 21449.00 frames. ], tot_loss[loss=0.02736, audio_tagging_loss=0.02736, over 21449.00 frames. ], batch size: 106, lr: 1.44e-02, grad_scale: 32.0
2023-12-20 18:24:54,841 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:25:16,064 INFO [train.py:917] (1/4) Epoch 34, validation: loss=0.0463, audio_tagging_loss=0.0463, over 3737520.00 frames.
2023-12-20 18:25:16,065 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:25:18,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=11440.0, ans=0.019000000000000003
2023-12-20 18:25:32,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=11506.666666666666, ans=0.018722222222222223
2023-12-20 18:25:53,225 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=4.756
2023-12-20 18:26:02,680 INFO [train.py:886] (1/4) Epoch 34, batch 50, loss[loss=0.02452, audio_tagging_loss=0.02452, over 25000.00 frames. ], tot_loss[loss=0.02557, audio_tagging_loss=0.02557, over 1116622.69 frames. ], batch size: 100, lr: 1.44e-02, grad_scale: 32.0
2023-12-20 18:26:24,394 INFO [train.py:886] (1/4) Epoch 35, batch 0, loss[loss=0.02645, audio_tagging_loss=0.02645, over 24158.00 frames. ], tot_loss[loss=0.02645, audio_tagging_loss=0.02645, over 24158.00 frames. ], batch size: 100, lr: 1.42e-02, grad_scale: 32.0
2023-12-20 18:26:24,395 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:26:43,983 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.9411, 2.5709, 2.4494, 2.7765], device='cuda:1')
2023-12-20 18:26:45,180 INFO [train.py:917] (1/4) Epoch 35, validation: loss=0.04736, audio_tagging_loss=0.04736, over 3737520.00 frames.
2023-12-20 18:26:45,181 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:26:49,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.60 vs. limit=7.946666666666666
2023-12-20 18:27:08,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=11920.0, ans=0.4828
2023-12-20 18:27:13,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=11986.666666666666, ans=0.125
2023-12-20 18:27:17,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=11986.666666666666, ans=0.18013333333333334
2023-12-20 18:27:23,466 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.764e+01 4.533e+01 5.198e+01 5.955e+01 1.043e+02, threshold=1.040e+02, percent-clipped=1.0
2023-12-20 18:27:33,753 INFO [train.py:886] (1/4) Epoch 35, batch 50, loss[loss=0.02185, audio_tagging_loss=0.02185, over 25000.00 frames. ], tot_loss[loss=0.02465, audio_tagging_loss=0.02465, over 1122465.19 frames. ], batch size: 100, lr: 1.42e-02, grad_scale: 32.0
2023-12-20 18:27:55,033 INFO [train.py:886] (1/4) Epoch 36, batch 0, loss[loss=0.02595, audio_tagging_loss=0.02595, over 24087.00 frames. ], tot_loss[loss=0.02595, audio_tagging_loss=0.02595, over 24087.00 frames. ], batch size: 100, lr: 1.40e-02, grad_scale: 32.0
2023-12-20 18:27:55,033 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:28:12,154 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.2063, 2.9432, 2.8897, 2.9267], device='cuda:1')
2023-12-20 18:28:16,075 INFO [train.py:917] (1/4) Epoch 36, validation: loss=0.04841, audio_tagging_loss=0.04841, over 3737520.00 frames.
2023-12-20 18:28:16,076 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:28:16,515 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.25 vs. limit=8.033333333333333
2023-12-20 18:28:19,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=12133.333333333334, ans=0.01611111111111111
2023-12-20 18:28:23,023 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.17 vs. limit=8.033333333333333
2023-12-20 18:28:25,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=12200.0, ans=0.008217391304347826
2023-12-20 18:28:30,501 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=12.075
2023-12-20 18:28:40,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=12.1
2023-12-20 18:28:43,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=12333.333333333334, ans=0.125
2023-12-20 18:28:51,436 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=12.125
2023-12-20 18:29:03,191 INFO [train.py:886] (1/4) Epoch 36, batch 50, loss[loss=0.02623, audio_tagging_loss=0.02623, over 25000.00 frames. ], tot_loss[loss=0.02421, audio_tagging_loss=0.02421, over 1120299.31 frames. ], batch size: 100, lr: 1.40e-02, grad_scale: 32.0
2023-12-20 18:29:24,444 INFO [train.py:886] (1/4) Epoch 37, batch 0, loss[loss=0.03095, audio_tagging_loss=0.03095, over 21211.00 frames. ], tot_loss[loss=0.03095, audio_tagging_loss=0.03095, over 21211.00 frames. ], batch size: 106, lr: 1.38e-02, grad_scale: 32.0
2023-12-20 18:29:24,445 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:29:34,436 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.0752, 4.7395, 4.6235, 4.3424], device='cuda:1')
2023-12-20 18:29:45,682 INFO [train.py:917] (1/4) Epoch 37, validation: loss=0.04928, audio_tagging_loss=0.04928, over 3737520.00 frames.
2023-12-20 18:29:45,683 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:29:51,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=4.872
2023-12-20 18:29:59,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=12546.666666666666, ans=0.17453333333333335
2023-12-20 18:30:01,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=12546.666666666666, ans=0.125
2023-12-20 18:30:02,466 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.54 vs. limit=12.205
2023-12-20 18:30:09,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=12613.333333333334, ans=0.45853333333333335
2023-12-20 18:30:11,388 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=12.23
2023-12-20 18:30:17,564 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.12 vs. limit=17.009999999999998
2023-12-20 18:30:19,000 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.554e+01 4.732e+01 5.545e+01 6.466e+01 1.044e+02, threshold=1.109e+02, percent-clipped=1.0
2023-12-20 18:30:22,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=12680.0, ans=0.013833333333333336
2023-12-20 18:30:27,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=12746.666666666666, ans=0.125
2023-12-20 18:30:30,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=12746.666666666666, ans=0.02
2023-12-20 18:30:32,821 INFO [train.py:886] (1/4) Epoch 37, batch 50, loss[loss=0.02155, audio_tagging_loss=0.02155, over 25000.00 frames. ], tot_loss[loss=0.02298, audio_tagging_loss=0.02298, over 1120275.84 frames. ], batch size: 100, lr: 1.38e-02, grad_scale: 32.0
2023-12-20 18:30:55,799 INFO [train.py:886] (1/4) Epoch 38, batch 0, loss[loss=0.01982, audio_tagging_loss=0.01982, over 25000.00 frames. ], tot_loss[loss=0.01982, audio_tagging_loss=0.01982, over 25000.00 frames. ], batch size: 100, lr: 1.36e-02, grad_scale: 32.0
2023-12-20 18:30:55,799 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:31:16,995 INFO [train.py:917] (1/4) Epoch 38, validation: loss=0.04916, audio_tagging_loss=0.04916, over 3737520.00 frames.
2023-12-20 18:31:16,996 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:31:20,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=12826.666666666666, ans=0.125
2023-12-20 18:31:24,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=12826.666666666666, ans=0.013222222222222225
2023-12-20 18:31:26,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=12893.333333333334, ans=0.035
2023-12-20 18:31:39,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=12960.0, ans=0.1704
2023-12-20 18:31:40,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.25 vs. limit=17.22
2023-12-20 18:32:04,831 INFO [train.py:886] (1/4) Epoch 38, batch 50, loss[loss=0.02165, audio_tagging_loss=0.02165, over 25000.00 frames. ], tot_loss[loss=0.02196, audio_tagging_loss=0.02196, over 1123455.74 frames. ], batch size: 100, lr: 1.36e-02, grad_scale: 32.0
2023-12-20 18:32:26,418 INFO [train.py:886] (1/4) Epoch 39, batch 0, loss[loss=0.0235, audio_tagging_loss=0.0235, over 24081.00 frames. ], tot_loss[loss=0.0235, audio_tagging_loss=0.0235, over 24081.00 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 32.0
2023-12-20 18:32:26,419 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:32:46,859 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.7960, 2.1194, 2.1444, 2.2305, 2.1937, 2.3078, 2.0764, 2.0770],
device='cuda:1')
2023-12-20 18:32:47,552 INFO [train.py:917] (1/4) Epoch 39, validation: loss=0.05058, audio_tagging_loss=0.05058, over 3737520.00 frames.
2023-12-20 18:32:47,552 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:32:49,137 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.94 vs. limit=11.586666666666666
2023-12-20 18:32:53,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.52 vs. limit=12.440000000000001
2023-12-20 18:33:05,195 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.626e-01
2023-12-20 18:33:05,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.60 vs. limit=8.31
2023-12-20 18:33:14,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=13306.666666666666, ans=0.43426666666666675
2023-12-20 18:33:17,346 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.971e+01 5.139e+01 5.911e+01 6.986e+01 1.449e+02, threshold=1.182e+02, percent-clipped=3.0
2023-12-20 18:33:26,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=13440.0, ans=0.1656
2023-12-20 18:33:33,695 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.493e-01
2023-12-20 18:33:35,341 INFO [train.py:886] (1/4) Epoch 39, batch 50, loss[loss=0.02042, audio_tagging_loss=0.02042, over 25000.00 frames. ], tot_loss[loss=0.02149, audio_tagging_loss=0.02149, over 1121525.25 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 32.0
2023-12-20 18:33:57,926 INFO [train.py:886] (1/4) Epoch 40, batch 0, loss[loss=0.02105, audio_tagging_loss=0.02105, over 24062.00 frames. ], tot_loss[loss=0.02105, audio_tagging_loss=0.02105, over 24062.00 frames. ], batch size: 100, lr: 1.32e-02, grad_scale: 32.0
2023-12-20 18:33:57,927 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:34:19,045 INFO [train.py:917] (1/4) Epoch 40, validation: loss=0.05208, audio_tagging_loss=0.05208, over 3737520.00 frames.
2023-12-20 18:34:19,046 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:34:40,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=13653.333333333334, ans=0.125
2023-12-20 18:34:44,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=13653.333333333334, ans=0.8865333333333333
2023-12-20 18:34:55,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=13720.0, ans=0.009500000000000001
2023-12-20 18:35:04,450 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.64 vs. limit=8.446666666666665
2023-12-20 18:35:06,547 INFO [train.py:886] (1/4) Epoch 40, batch 50, loss[loss=0.02269, audio_tagging_loss=0.02269, over 25000.00 frames. ], tot_loss[loss=0.02017, audio_tagging_loss=0.02017, over 1121341.24 frames. ], batch size: 100, lr: 1.32e-02, grad_scale: 32.0
2023-12-20 18:35:29,525 INFO [train.py:886] (1/4) Epoch 41, batch 0, loss[loss=0.01894, audio_tagging_loss=0.01894, over 25000.00 frames. ], tot_loss[loss=0.01894, audio_tagging_loss=0.01894, over 25000.00 frames. ], batch size: 100, lr: 1.30e-02, grad_scale: 32.0
2023-12-20 18:35:29,526 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:35:47,521 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.3157, 2.9804, 3.3457, 3.0635], device='cuda:1')
2023-12-20 18:35:50,409 INFO [train.py:917] (1/4) Epoch 41, validation: loss=0.05259, audio_tagging_loss=0.05259, over 3737520.00 frames.
2023-12-20 18:35:50,409 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:36:14,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=14000.0, ans=0.125
2023-12-20 18:36:16,658 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.775e+01 5.160e+01 5.694e+01 6.780e+01 1.124e+02, threshold=1.139e+02, percent-clipped=0.0
2023-12-20 18:36:17,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=14000.0, ans=0.10999999999999999
2023-12-20 18:36:31,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=14133.333333333334, ans=0.125
2023-12-20 18:36:37,870 INFO [train.py:886] (1/4) Epoch 41, batch 50, loss[loss=0.0187, audio_tagging_loss=0.0187, over 25000.00 frames. ], tot_loss[loss=0.01926, audio_tagging_loss=0.01926, over 1116780.63 frames. ], batch size: 100, lr: 1.30e-02, grad_scale: 32.0
2023-12-20 18:37:00,622 INFO [train.py:886] (1/4) Epoch 42, batch 0, loss[loss=0.01991, audio_tagging_loss=0.01991, over 24159.00 frames. ], tot_loss[loss=0.01991, audio_tagging_loss=0.01991, over 24159.00 frames. ], batch size: 100, lr: 1.29e-02, grad_scale: 32.0
2023-12-20 18:37:00,622 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:37:21,715 INFO [train.py:917] (1/4) Epoch 42, validation: loss=0.0541, audio_tagging_loss=0.0541, over 3737520.00 frames.
2023-12-20 18:37:21,716 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:37:27,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=14213.333333333334, ans=0.125
2023-12-20 18:37:34,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=14280.0, ans=0.125
2023-12-20 18:37:35,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=14280.0, ans=0.125
2023-12-20 18:37:35,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=14280.0, ans=0.8928
2023-12-20 18:37:41,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=14346.666666666666, ans=0.4152
2023-12-20 18:37:48,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=14346.666666666666, ans=0.006888888888888889
2023-12-20 18:38:09,808 INFO [train.py:886] (1/4) Epoch 42, batch 50, loss[loss=0.01856, audio_tagging_loss=0.01856, over 25000.00 frames. ], tot_loss[loss=0.01859, audio_tagging_loss=0.01859, over 1119067.54 frames. ], batch size: 100, lr: 1.29e-02, grad_scale: 32.0
2023-12-20 18:38:32,307 INFO [train.py:886] (1/4) Epoch 43, batch 0, loss[loss=0.02681, audio_tagging_loss=0.02681, over 20614.00 frames. ], tot_loss[loss=0.02681, audio_tagging_loss=0.02681, over 20614.00 frames. ], batch size: 106, lr: 1.27e-02, grad_scale: 32.0
2023-12-20 18:38:32,308 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:38:40,438 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.1897, 2.6265, 2.5322, 2.9896], device='cuda:1')
2023-12-20 18:38:53,029 INFO [train.py:917] (1/4) Epoch 43, validation: loss=0.05602, audio_tagging_loss=0.05602, over 3737520.00 frames.
2023-12-20 18:38:53,030 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:38:53,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=14560.0, ans=0.006000000000000005
2023-12-20 18:39:01,216 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.95 vs. limit=12.96
2023-12-20 18:39:02,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=14560.0, ans=0.125
2023-12-20 18:39:02,236 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.98 vs. limit=8.64
2023-12-20 18:39:16,031 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 4.316e+01 5.471e+01 6.063e+01 6.688e+01 1.130e+02, threshold=1.213e+02, percent-clipped=0.0
2023-12-20 18:39:17,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=9.877333333333333
2023-12-20 18:39:29,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=14760.0, ans=0.3834000000000001
2023-12-20 18:39:38,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=14826.666666666666, ans=0.125
2023-12-20 18:39:39,400 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.11 vs. limit=13.059999999999999
2023-12-20 18:39:41,477 INFO [train.py:886] (1/4) Epoch 43, batch 50, loss[loss=0.0175, audio_tagging_loss=0.0175, over 25000.00 frames. ], tot_loss[loss=0.01777, audio_tagging_loss=0.01777, over 1117128.51 frames. ], batch size: 100, lr: 1.27e-02, grad_scale: 32.0
2023-12-20 18:40:04,355 INFO [train.py:886] (1/4) Epoch 44, batch 0, loss[loss=0.01599, audio_tagging_loss=0.01599, over 24088.00 frames. ], tot_loss[loss=0.01599, audio_tagging_loss=0.01599, over 24088.00 frames. ], batch size: 100, lr: 1.25e-02, grad_scale: 32.0
2023-12-20 18:40:04,355 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:40:25,327 INFO [train.py:917] (1/4) Epoch 44, validation: loss=0.05682, audio_tagging_loss=0.05682, over 3737520.00 frames.
2023-12-20 18:40:25,328 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:40:31,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=14906.666666666666, ans=0.125
2023-12-20 18:40:32,895 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.58 vs. limit=13.09
2023-12-20 18:40:40,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=14973.333333333334, ans=0.125
2023-12-20 18:40:40,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=14973.333333333334, ans=0.15026666666666666
2023-12-20 18:40:52,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=15040.0, ans=0.125
2023-12-20 18:40:57,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=15106.666666666666, ans=0.125
2023-12-20 18:41:10,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=15173.333333333334, ans=0.09826666666666664
2023-12-20 18:41:10,486 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=13.190000000000001
2023-12-20 18:41:12,865 INFO [train.py:886] (1/4) Epoch 44, batch 50, loss[loss=0.01775, audio_tagging_loss=0.01775, over 25000.00 frames. ], tot_loss[loss=0.01683, audio_tagging_loss=0.01683, over 1115327.13 frames. ], batch size: 100, lr: 1.25e-02, grad_scale: 32.0
2023-12-20 18:41:35,895 INFO [train.py:886] (1/4) Epoch 45, batch 0, loss[loss=0.01589, audio_tagging_loss=0.01589, over 25000.00 frames. ], tot_loss[loss=0.01589, audio_tagging_loss=0.01589, over 25000.00 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 32.0
2023-12-20 18:41:35,896 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:41:56,901 INFO [train.py:917] (1/4) Epoch 45, validation: loss=0.05811, audio_tagging_loss=0.05811, over 3737520.00 frames.
2023-12-20 18:41:56,902 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:42:05,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=15253.333333333334, ans=0.003111111111111106
2023-12-20 18:42:15,214 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.876e+01 5.082e+01 5.625e+01 6.615e+01 1.122e+02, threshold=1.125e+02, percent-clipped=0.0
2023-12-20 18:42:23,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=15386.666666666666, ans=0.002555555555555554
2023-12-20 18:42:26,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=15453.333333333334, ans=0.125
2023-12-20 18:42:29,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=15453.333333333334, ans=0.14546666666666666
2023-12-20 18:42:34,724 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.93 vs. limit=13.32
2023-12-20 18:42:44,420 INFO [train.py:886] (1/4) Epoch 45, batch 50, loss[loss=0.01693, audio_tagging_loss=0.01693, over 25000.00 frames. ], tot_loss[loss=0.01692, audio_tagging_loss=0.01692, over 1112449.27 frames. ], batch size: 100, lr: 1.24e-02, grad_scale: 64.0
2023-12-20 18:43:02,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=15600.0, ans=0.0
2023-12-20 18:43:03,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=13.35
2023-12-20 18:43:06,810 INFO [train.py:886] (1/4) Epoch 46, batch 0, loss[loss=0.01734, audio_tagging_loss=0.01734, over 24100.00 frames. ], tot_loss[loss=0.01734, audio_tagging_loss=0.01734, over 24100.00 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 64.0
2023-12-20 18:43:06,811 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:43:18,196 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.5318, 2.7592, 2.9153, 2.9462], device='cuda:1')
2023-12-20 18:43:27,878 INFO [train.py:917] (1/4) Epoch 46, validation: loss=0.05956, audio_tagging_loss=0.05956, over 3737520.00 frames.
2023-12-20 18:43:27,879 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:43:34,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=15600.0, ans=0.007478260869565217
2023-12-20 18:43:37,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=15666.666666666666, ans=10.0
2023-12-20 18:43:39,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=15666.666666666666, ans=0.9066666666666666
2023-12-20 18:43:57,799 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=13.425
2023-12-20 18:44:07,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=15866.666666666666, ans=0.125
2023-12-20 18:44:08,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.30 vs. limit=10.346666666666668
2023-12-20 18:44:15,171 INFO [train.py:886] (1/4) Epoch 46, batch 50, loss[loss=0.01433, audio_tagging_loss=0.01433, over 25000.00 frames. ], tot_loss[loss=0.01549, audio_tagging_loss=0.01549, over 1126431.18 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 64.0
2023-12-20 18:44:38,146 INFO [train.py:886] (1/4) Epoch 47, batch 0, loss[loss=0.01727, audio_tagging_loss=0.01727, over 24049.00 frames. ], tot_loss[loss=0.01727, audio_tagging_loss=0.01727, over 24049.00 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0
2023-12-20 18:44:38,147 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:44:48,611 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.3128, 2.9586, 3.3100, 2.8855], device='cuda:1')
2023-12-20 18:44:59,322 INFO [train.py:917] (1/4) Epoch 47, validation: loss=0.06125, audio_tagging_loss=0.06125, over 3737520.00 frames.
2023-12-20 18:44:59,323 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:45:01,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=15946.666666666666, ans=0.14053333333333334
2023-12-20 18:45:01,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=15946.666666666666, ans=0.125
2023-12-20 18:45:06,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.77 vs. limit=13.48
2023-12-20 18:45:14,000 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 4.428e+01 5.199e+01 5.973e+01 6.776e+01 1.435e+02, threshold=1.195e+02, percent-clipped=1.0
2023-12-20 18:45:16,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=16013.333333333334, ans=0.33953333333333335
2023-12-20 18:45:32,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=16146.666666666666, ans=0.007359420289855072
2023-12-20 18:45:32,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=16146.666666666666, ans=0.125
2023-12-20 18:45:38,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=16213.333333333334, ans=0.13786666666666667
2023-12-20 18:45:40,287 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.53 vs. limit=13.58
2023-12-20 18:45:46,334 INFO [train.py:886] (1/4) Epoch 47, batch 50, loss[loss=0.01334, audio_tagging_loss=0.01334, over 25000.00 frames. ], tot_loss[loss=0.01472, audio_tagging_loss=0.01472, over 1119525.87 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 64.0
2023-12-20 18:46:04,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=16293.333333333334, ans=0.125
2023-12-20 18:46:08,721 INFO [train.py:886] (1/4) Epoch 48, batch 0, loss[loss=0.03003, audio_tagging_loss=0.03003, over 21467.00 frames. ], tot_loss[loss=0.03003, audio_tagging_loss=0.03003, over 21467.00 frames. ], batch size: 106, lr: 1.20e-02, grad_scale: 64.0
2023-12-20 18:46:08,722 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:46:29,405 INFO [train.py:917] (1/4) Epoch 48, validation: loss=0.06238, audio_tagging_loss=0.06238, over 3737520.00 frames.
2023-12-20 18:46:29,406 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:46:45,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=16360.0, ans=0.125
2023-12-20 18:46:55,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.48 vs. limit=5.464
2023-12-20 18:47:12,670 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=13.71
2023-12-20 18:47:13,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.64 vs. limit=13.71
2023-12-20 18:47:16,769 INFO [train.py:886] (1/4) Epoch 48, batch 50, loss[loss=0.01539, audio_tagging_loss=0.01539, over 25000.00 frames. ], tot_loss[loss=0.01512, audio_tagging_loss=0.01512, over 1119249.00 frames. ], batch size: 100, lr: 1.19e-02, grad_scale: 64.0
2023-12-20 18:47:37,824 INFO [train.py:886] (1/4) Epoch 49, batch 0, loss[loss=0.0168, audio_tagging_loss=0.0168, over 24182.00 frames. ], tot_loss[loss=0.0168, audio_tagging_loss=0.0168, over 24182.00 frames. ], batch size: 100, lr: 1.18e-02, grad_scale: 64.0
2023-12-20 18:47:37,825 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:47:58,816 INFO [train.py:917] (1/4) Epoch 49, validation: loss=0.06394, audio_tagging_loss=0.06394, over 3737520.00 frames.
2023-12-20 18:47:58,817 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:48:07,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.07 vs. limit=5.496
2023-12-20 18:48:07,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=16706.666666666668, ans=0.125
2023-12-20 18:48:09,447 WARNING [optim.py:484] (1/4) Clipping_scale=2.0, grad-norm quartiles 4.348e+01 5.324e+01 6.019e+01 6.956e+01 1.317e+02, threshold=1.204e+02, percent-clipped=1.0
2023-12-20 18:48:21,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=16773.333333333332, ans=0.125
2023-12-20 18:48:28,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=16840.0, ans=0.125
2023-12-20 18:48:42,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=16906.666666666668, ans=0.125
2023-12-20 18:48:45,803 INFO [train.py:886] (1/4) Epoch 49, batch 50, loss[loss=0.01189, audio_tagging_loss=0.01189, over 25000.00 frames. ], tot_loss[loss=0.01372, audio_tagging_loss=0.01372, over 1119978.83 frames. ], batch size: 100, lr: 1.18e-02, grad_scale: 64.0
2023-12-20 18:49:07,494 INFO [train.py:886] (1/4) Epoch 50, batch 0, loss[loss=0.0152, audio_tagging_loss=0.0152, over 24192.00 frames. ], tot_loss[loss=0.0152, audio_tagging_loss=0.0152, over 24192.00 frames. ], batch size: 100, lr: 1.17e-02, grad_scale: 64.0
2023-12-20 18:49:07,495 INFO [train.py:909] (1/4) Computing validation loss
2023-12-20 18:49:17,367 INFO [zipformer.py:1858] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.1453, 2.7847, 2.7589, 2.6183], device='cuda:1')
2023-12-20 18:49:28,225 INFO [train.py:917] (1/4) Epoch 50, validation: loss=0.06678, audio_tagging_loss=0.06678, over 3737520.00 frames.
2023-12-20 18:49:28,226 INFO [train.py:918] (1/4) Maximum memory allocated so far is 14870MB
2023-12-20 18:49:33,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=16986.666666666668, ans=0.0
2023-12-20 18:50:05,644 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.05 vs. limit=10.901333333333334
2023-12-20 18:50:06,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=17253.333333333332, ans=0.125
2023-12-20 18:50:08,547 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=13.969999999999999
2023-12-20 18:50:14,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=17320.0, ans=0.0
2023-12-20 18:50:15,459 INFO [train.py:886] (1/4) Epoch 50, batch 50, loss[loss=0.01495, audio_tagging_loss=0.01495, over 25000.00 frames. ], tot_loss[loss=0.0128, audio_tagging_loss=0.0128, over 1124283.53 frames. ], batch size: 100, lr: 1.17e-02, grad_scale: 32.0
2023-12-20 18:50:18,119 INFO [train.py:1099] (1/4) Done!