2024-08-19 16:40:02,054 INFO [train_multi_KD3.py:1188] (3/4) Training started 2024-08-19 16:40:02,054 INFO [train_multi_KD3.py:1198] (3/4) Device: cuda:3 2024-08-19 16:40:02,054 INFO [train_multi_KD3.py:1214] (3/4) Using dtype=torch.bfloat16 2024-08-19 16:40:02,054 INFO [train_multi_KD3.py:1216] (3/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'e400fa3b456faf8afe0ee5bfe572946b4921a3db', 'k2-git-date': 'Sat Jul 15 04:21:50 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.9', 'icefall-git-branch': 'multi_KD_with_wenet', 'icefall-git-sha1': '3210a8ed-dirty', 'icefall-git-date': 'Mon Aug 19 16:16:48 2024', 'icefall-path': '/xy/mnt/yangxiaoyu/workspace/icefall_multi_KD', 'k2-path': '/root/anaconda3/lib/python3.9/site-packages/k2/__init__.py', 'lhotse-path': '/root/anaconda3/lib/python3.9/site-packages/lhotse/__init__.py', 'hostname': 'NGK_xiaoyu'}, 'world_size': 4, 'master_port': 13440, 'tensorboard': True, 'num_epochs': 35, 'start_epoch': 31, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'stop_early': True, 'use_fp16': False, 'use_bf16': True, 'share_asr': True, 'beats_loss_scale': 1.0, 'ecapa_loss_scale': 10.0, 'whisper_loss_scale': 1.0, 'whisper_cb_loss_scale': 0.01, 'repeat_librispeech': 5, 'repeat_wenetspeech': 0, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'speaker_input_idx': 2, 'whisper_dim': 1280, 'use_task_id': True, 'num_codebooks': 32, 'mvq_kd_layer_idx': -1, 'use_subsampled_output': True, 'delta_t': 6, 'full_libri': True, 'mini_libri': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_librispeech': True, 'use_wenetspeech': False, 'use_audioset': True, 'audioset_subset': 'unbalanced', 'use_voxceleb': True, 'voxceleb_subset': 'vox2', 'use_fma': False, 'fma_subset': 'large', 'manifest_dir': PosixPath('data/fbank_LSVoxAs_with_whisper_large-v3_with_taskID'), 'max_duration': 1500, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 1, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'enable_musan': False, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'large-v3', 'use_mert': False, 'dtype': torch.bfloat16, 'use_amp': True} 2024-08-19 16:40:02,055 INFO [train_multi_KD3.py:1218] (3/4) About to create model 2024-08-19 16:40:02,401 INFO [model_shift.py:142] (3/4) Delta_t: 6 when computing the distillation loss 2024-08-19 16:40:02,405 INFO [train_multi_KD3.py:1222] (3/4) Number of model parameters: 66484678 2024-08-19 16:40:02,406 INFO [checkpoint.py:112] (3/4) Loading checkpoint from multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-30.pt 2024-08-19 16:40:04,690 INFO [train_multi_KD3.py:1237] (3/4) Using DDP 2024-08-19 16:40:06,068 INFO [train_multi_KD3.py:1249] (3/4) Loading optimizer state dict 2024-08-19 16:40:06,339 INFO [train_multi_KD3.py:1257] (3/4) Loading scheduler state dict 2024-08-19 16:40:06,339 INFO [kd_datamodule.py:690] (3/4) About to get train 960 cuts 2024-08-19 16:40:06,382 INFO [kd_datamodule.py:862] (3/4) About to get the voxceleb cuts. 2024-08-19 16:40:06,383 INFO [kd_datamodule.py:873] (3/4) Adding voxceleb2 cuts. 2024-08-19 16:40:06,384 INFO [train_multi_KD3.py:1320] (3/4) Getting audioset cuts 2024-08-19 16:40:06,385 INFO [kd_datamodule.py:881] (3/4) About to get the audioset cuts for KD. 2024-08-19 16:40:06,387 INFO [train_multi_KD3.py:1326] (3/4) Using mux to combine Librispeech: True, WenetSpeech: False, audioset: True and voxceleb: True 2024-08-19 16:40:14,481 INFO [train_multi_KD3.py:1328] (3/4) Using mux to combine [CutSet(len=1406195) [underlying data type: ], CutSet(len=1187704) [underlying data type: ], CutSet(len=1904746) [underlying data type: ]] 2024-08-19 16:40:14,481 INFO [train_multi_KD3.py:1329] (3/4) Using weights: [1406195, 1187704, 1904746] 2024-08-19 16:40:14,482 INFO [train_multi_KD3.py:1338] (3/4) CutSet(len=4498645) [underlying data type: ] 2024-08-19 16:40:14,482 INFO [kd_datamodule.py:449] (3/4) Disable MUSAN 2024-08-19 16:40:14,483 INFO [kd_datamodule.py:489] (3/4) Disable SpecAugment 2024-08-19 16:40:14,483 INFO [kd_datamodule.py:491] (3/4) About to create train dataset 2024-08-19 16:40:14,483 INFO [kd_datamodule.py:528] (3/4) Using SimpleCutSampler 2024-08-19 16:40:14,484 INFO [kd_datamodule.py:536] (3/4) About to create train dataloader 2024-08-19 16:40:14,486 INFO [kd_datamodule.py:756] (3/4) About to get dev-clean cuts 2024-08-19 16:40:14,488 INFO [kd_datamodule.py:774] (3/4) About to get dev-other cuts 2024-08-19 16:40:14,489 INFO [kd_datamodule.py:570] (3/4) About to create dev dataset 2024-08-19 16:40:14,773 INFO [kd_datamodule.py:591] (3/4) About to create dev dataloader 2024-08-19 16:40:14,773 INFO [kd_datamodule.py:833] (3/4) About to get the test set of voxceleb1 set. 2024-08-19 16:40:14,774 INFO [kd_datamodule.py:570] (3/4) About to create dev dataset 2024-08-19 16:40:15,009 INFO [kd_datamodule.py:591] (3/4) About to create dev dataloader 2024-08-19 16:40:15,009 INFO [kd_datamodule.py:893] (3/4) About to get the audioset eval cuts. 2024-08-19 16:40:15,010 INFO [kd_datamodule.py:570] (3/4) About to create dev dataset 2024-08-19 16:40:15,512 INFO [kd_datamodule.py:591] (3/4) About to create dev dataloader 2024-08-19 16:40:15,513 INFO [train_multi_KD3.py:1418] (3/4) ['ASR_libri', 'SV_voxceleb1', 'AT_audioset'] 2024-08-19 16:40:15,513 INFO [train_multi_KD3.py:1422] (3/4) Loading grad scaler state dict 2024-08-19 16:40:31,713 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 0, loss[loss=0.1271, beats_loss=0.007867, ecapa_loss=0.00014, whisper_loss=0.1178, over 24166.00 frames. ], tot_loss[loss=0.1271, beats_loss=0.007867, ecapa_loss=0.00014, whisper_loss=0.1178, over 24166.00 frames. ], batch size: 91, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:40:31,714 INFO [train_multi_KD3.py:1140] (3/4) Computing validation loss 2024-08-19 16:41:05,598 INFO [train_multi_KD3.py:1150] (3/4) Epoch 31, validation on ASR_libri: loss=0.253, beats_loss=0, ecapa_loss=0.0005148, whisper_loss=0.2478, over 931116.00 frames. 2024-08-19 16:41:25,409 INFO [train_multi_KD3.py:1150] (3/4) Epoch 31, validation on SV_voxceleb1: loss=0.003992, beats_loss=0, ecapa_loss=0.0003992, whisper_loss=0, over 944235.00 frames. 2024-08-19 16:42:59,859 INFO [train_multi_KD3.py:1150] (3/4) Epoch 31, validation on AT_audioset: loss=0.02301, beats_loss=0.02301, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 16:42:59,861 INFO [train_multi_KD3.py:1156] (3/4) Maximum memory allocated so far is 32606MB 2024-08-19 16:43:46,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4445990.0, ans=0.125 2024-08-19 16:43:47,940 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 29 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 16:43:54,953 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 23 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-19 16:43:57,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4446090.0, ans=0.0 2024-08-19 16:44:06,313 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 20 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-19 16:44:26,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4446190.0, ans=0.2 2024-08-19 16:44:34,275 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 28 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-19 16:44:43,281 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.403e+01 2.746e+01 3.133e+01 8.282e+01, threshold=5.492e+01, percent-clipped=1.0 2024-08-19 16:44:56,344 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.55 vs. limit=22.5 2024-08-19 16:45:00,515 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 50, loss[loss=0.1193, beats_loss=0.008474, ecapa_loss=0.0001498, whisper_loss=0.1093, over 18443.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.009542, ecapa_loss=0.0001396, whisper_loss=0.09069, over 857640.59 frames. ], batch size: 71, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:45:02,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4446390.0, ans=0.125 2024-08-19 16:45:34,728 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 27 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-19 16:45:50,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4446590.0, ans=0.125 2024-08-19 16:45:59,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4446590.0, ans=0.0 2024-08-19 16:46:12,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4446690.0, ans=0.125 2024-08-19 16:46:19,746 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2024-08-19 16:46:45,590 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 17 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-19 16:46:45,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4446790.0, ans=0.125 2024-08-19 16:46:55,260 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 100, loss[loss=0.1076, beats_loss=0.009266, ecapa_loss=0.0001383, whisper_loss=0.09697, over 22884.00 frames. ], tot_loss[loss=0.09991, beats_loss=0.00947, ecapa_loss=0.0001402, whisper_loss=0.08903, over 1523204.32 frames. ], batch size: 90, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:47:20,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4446990.0, ans=0.125 2024-08-19 16:48:24,823 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.103e+01 2.615e+01 2.787e+01 3.101e+01 5.493e+01, threshold=5.575e+01, percent-clipped=1.0 2024-08-19 16:48:27,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4447290.0, ans=0.0 2024-08-19 16:48:28,441 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.05 vs. limit=15.0 2024-08-19 16:48:40,691 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 150, loss[loss=0.09259, beats_loss=0.00773, ecapa_loss=0.0001634, whisper_loss=0.08322, over 15951.00 frames. ], tot_loss[loss=0.09953, beats_loss=0.009404, ecapa_loss=0.000143, whisper_loss=0.0887, over 2029759.74 frames. ], batch size: 65, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:48:49,618 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.23 vs. limit=12.0 2024-08-19 16:48:52,908 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 16:49:02,099 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 26 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 16:49:11,026 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-19 16:49:32,119 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 17 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-19 16:49:40,609 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-19 16:50:13,651 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 200, loss[loss=0.1017, beats_loss=0.008891, ecapa_loss=0.0001539, whisper_loss=0.09128, over 17382.00 frames. ], tot_loss[loss=0.09969, beats_loss=0.00962, ecapa_loss=0.0001419, whisper_loss=0.08865, over 2403574.12 frames. ], batch size: 70, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:50:19,185 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 14 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-19 16:50:32,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4447990.0, ans=0.125 2024-08-19 16:50:37,787 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 34 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-19 16:51:08,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4448190.0, ans=0.125 2024-08-19 16:51:16,835 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 20 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-19 16:51:20,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4448290.0, ans=0.125 2024-08-19 16:51:24,146 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.343e+01 2.586e+01 2.857e+01 5.487e+01, threshold=5.172e+01, percent-clipped=0.0 2024-08-19 16:51:37,997 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 250, loss[loss=0.1148, beats_loss=0.009637, ecapa_loss=0.0001444, whisper_loss=0.1037, over 24591.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.009782, ecapa_loss=0.0001415, whisper_loss=0.0889, over 2696267.76 frames. ], batch size: 95, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:51:45,273 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 33 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-19 16:51:54,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4448490.0, ans=0.125 2024-08-19 16:51:57,105 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 16 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-19 16:51:58,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4448490.0, ans=0.125 2024-08-19 16:52:09,907 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 33 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-19 16:52:14,580 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 21 from LS+wenet, 32 from Vox, 42 fro AS 2024-08-19 16:52:14,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4448590.0, ans=0.125 2024-08-19 16:52:50,212 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=22.5 2024-08-19 16:52:54,178 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=12.0 2024-08-19 16:52:56,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.80 vs. limit=15.0 2024-08-19 16:53:01,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4448790.0, ans=0.0 2024-08-19 16:53:03,187 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 300, loss[loss=0.09519, beats_loss=0.01015, ecapa_loss=0.0001123, whisper_loss=0.08391, over 14580.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.009959, ecapa_loss=0.0001409, whisper_loss=0.08935, over 2943059.88 frames. ], batch size: 53, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:53:19,624 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 27 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-19 16:53:24,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4448990.0, ans=0.125 2024-08-19 16:53:26,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4448990.0, ans=0.1 2024-08-19 16:53:35,221 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 20 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-19 16:53:37,065 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 16:53:37,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4449090.0, ans=0.2 2024-08-19 16:53:44,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4449090.0, ans=0.125 2024-08-19 16:53:54,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4449190.0, ans=0.125 2024-08-19 16:53:59,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4449190.0, ans=0.0 2024-08-19 16:54:11,288 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.292e+01 2.578e+01 2.922e+01 3.653e+02, threshold=5.156e+01, percent-clipped=3.0 2024-08-19 16:54:18,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4449290.0, ans=0.1 2024-08-19 16:54:21,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4449290.0, ans=0.125 2024-08-19 16:54:23,845 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 350, loss[loss=0.05966, beats_loss=0.01093, ecapa_loss=0.0001121, whisper_loss=0.04761, over 14255.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01006, ecapa_loss=0.0001397, whisper_loss=0.08881, over 3093974.18 frames. ], batch size: 52, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 16:54:33,649 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 27 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-19 16:54:46,255 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 29 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-19 16:54:59,104 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 17 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 16:55:02,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4449590.0, ans=0.2 2024-08-19 16:55:38,415 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.14 vs. limit=15.0 2024-08-19 16:55:42,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.63 vs. limit=10.0 2024-08-19 16:55:43,245 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 400, loss[loss=0.1054, beats_loss=0.01047, ecapa_loss=0.0001529, whisper_loss=0.09344, over 16011.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01011, ecapa_loss=0.00014, whisper_loss=0.08972, over 3262108.97 frames. ], batch size: 66, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 16:55:47,474 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 28 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 16:56:09,314 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2024-08-19 16:56:31,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4450190.0, ans=0.125 2024-08-19 16:56:52,311 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.167e+01 2.459e+01 2.674e+01 3.849e+01, threshold=4.918e+01, percent-clipped=0.0 2024-08-19 16:56:52,934 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 16:57:03,515 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08908264338970184, model_norm_threshold=49.18006896972656 2024-08-19 16:57:03,678 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.931e+04, grad_sumsq=3.931e+04, orig_rms_sq=1.000e+00 2024-08-19 16:57:05,399 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 450, loss[loss=0.1001, beats_loss=0.01031, ecapa_loss=0.0001361, whisper_loss=0.08839, over 21794.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01011, ecapa_loss=0.0001401, whisper_loss=0.08986, over 3410370.58 frames. ], batch size: 86, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 16:57:20,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=4450490.0, ans=0.2 2024-08-19 16:57:35,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4450490.0, ans=0.0 2024-08-19 16:57:39,050 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.40 vs. limit=22.5 2024-08-19 16:57:45,487 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 21 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 16:57:50,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4450590.0, ans=0.125 2024-08-19 16:57:50,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4450590.0, ans=0.125 2024-08-19 16:58:04,062 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 21 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-19 16:58:11,453 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2024-08-19 16:58:25,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4450790.0, ans=0.2 2024-08-19 16:58:28,190 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 500, loss[loss=0.113, beats_loss=0.009353, ecapa_loss=0.0001714, whisper_loss=0.1019, over 22733.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01014, ecapa_loss=0.0001401, whisper_loss=0.08972, over 3460710.49 frames. ], batch size: 93, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 16:58:32,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4450890.0, ans=0.125 2024-08-19 16:58:51,104 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 18 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-19 16:58:59,426 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 18 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-19 16:59:09,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4451090.0, ans=0.125 2024-08-19 16:59:13,295 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2024-08-19 16:59:17,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4451190.0, ans=0.125 2024-08-19 16:59:25,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4451190.0, ans=0.07 2024-08-19 16:59:36,734 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.363e+01 2.706e+01 3.045e+01 5.521e+02, threshold=5.412e+01, percent-clipped=1.0 2024-08-19 16:59:49,871 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 550, loss[loss=0.09259, beats_loss=0.009316, ecapa_loss=0.000212, whisper_loss=0.08115, over 14686.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01015, ecapa_loss=0.0001408, whisper_loss=0.08979, over 3523747.79 frames. ], batch size: 64, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 16:59:55,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4451390.0, ans=0.125 2024-08-19 17:00:13,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4451390.0, ans=0.0 2024-08-19 17:00:19,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4451490.0, ans=0.2 2024-08-19 17:00:20,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4451490.0, ans=0.2 2024-08-19 17:00:34,595 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 14 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-19 17:00:38,171 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 17 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-19 17:00:43,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4451590.0, ans=0.09899494936611666 2024-08-19 17:00:43,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4451590.0, ans=0.0 2024-08-19 17:00:57,767 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 16 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-19 17:00:59,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4451690.0, ans=0.0 2024-08-19 17:01:09,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4451790.0, ans=0.0 2024-08-19 17:01:21,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4451790.0, ans=0.125 2024-08-19 17:01:21,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4451790.0, ans=0.0 2024-08-19 17:01:23,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4451790.0, ans=0.125 2024-08-19 17:01:24,728 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 24 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-19 17:01:26,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4451890.0, ans=0.1 2024-08-19 17:01:27,126 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 600, loss[loss=0.09706, beats_loss=0.01215, ecapa_loss=0.0001117, whisper_loss=0.0838, over 14181.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01014, ecapa_loss=0.0001398, whisper_loss=0.08989, over 3570876.23 frames. ], batch size: 54, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:01:30,788 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 29 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-19 17:01:38,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4451890.0, ans=0.1 2024-08-19 17:01:43,364 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.72 vs. limit=10.0 2024-08-19 17:01:43,995 INFO [train_multi_KD3.py:845] (3/4) A total of 49 cuts. 12 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-19 17:01:48,186 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2024-08-19 17:01:58,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4451990.0, ans=0.0 2024-08-19 17:02:00,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4452090.0, ans=0.2 2024-08-19 17:02:04,755 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 17 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 17:02:05,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4452090.0, ans=0.125 2024-08-19 17:02:23,308 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 17:02:23,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4452190.0, ans=0.0 2024-08-19 17:02:38,433 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.243e+01 2.476e+01 2.748e+01 6.280e+01, threshold=4.953e+01, percent-clipped=2.0 2024-08-19 17:02:39,665 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-19 17:02:48,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4452290.0, ans=0.07 2024-08-19 17:02:48,527 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2024-08-19 17:02:53,184 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 650, loss[loss=0.09948, beats_loss=0.01104, ecapa_loss=0.0001206, whisper_loss=0.08723, over 17604.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01015, ecapa_loss=0.0001396, whisper_loss=0.08971, over 3587169.10 frames. ], batch size: 69, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:02:53,769 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 20 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-19 17:03:16,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4452490.0, ans=0.125 2024-08-19 17:03:21,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4452490.0, ans=0.125 2024-08-19 17:03:28,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4452590.0, ans=0.125 2024-08-19 17:03:28,806 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-08-19 17:03:37,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4452590.0, ans=0.0 2024-08-19 17:03:38,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4452590.0, ans=0.125 2024-08-19 17:03:40,911 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.85 vs. limit=22.5 2024-08-19 17:03:43,382 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 18 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-19 17:04:00,834 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 21 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 17:04:07,271 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-19 17:04:17,930 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 700, loss[loss=0.08437, beats_loss=0.008043, ecapa_loss=0.0001481, whisper_loss=0.07485, over 16775.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01018, ecapa_loss=0.0001404, whisper_loss=0.08907, over 3631256.19 frames. ], batch size: 66, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:04:28,580 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 24 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-19 17:05:02,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4453090.0, ans=0.0 2024-08-19 17:05:13,396 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 17 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-19 17:05:27,769 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.284e+01 2.551e+01 2.853e+01 6.068e+01, threshold=5.102e+01, percent-clipped=1.0 2024-08-19 17:05:41,204 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 750, loss[loss=0.0942, beats_loss=0.01076, ecapa_loss=0.0001328, whisper_loss=0.08211, over 20775.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01027, ecapa_loss=0.0001397, whisper_loss=0.08845, over 3669519.06 frames. ], batch size: 82, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:05:43,593 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 17 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-19 17:05:56,684 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 17:06:17,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4453590.0, ans=0.125 2024-08-19 17:06:21,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4453590.0, ans=0.125 2024-08-19 17:06:24,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4453590.0, ans=0.0 2024-08-19 17:06:29,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4453590.0, ans=0.0 2024-08-19 17:06:38,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4453690.0, ans=0.0 2024-08-19 17:06:57,192 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 17:07:07,256 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 800, loss[loss=0.1052, beats_loss=0.008687, ecapa_loss=0.0001551, whisper_loss=0.09498, over 17937.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01023, ecapa_loss=0.0001398, whisper_loss=0.08896, over 3664815.39 frames. ], batch size: 72, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:07:10,130 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 32 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-19 17:07:13,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4453890.0, ans=0.1 2024-08-19 17:07:18,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4453890.0, ans=0.125 2024-08-19 17:07:25,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4453990.0, ans=0.0 2024-08-19 17:07:33,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4453990.0, ans=0.125 2024-08-19 17:07:35,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4453990.0, ans=0.0 2024-08-19 17:07:55,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4454090.0, ans=0.125 2024-08-19 17:08:13,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4454190.0, ans=0.125 2024-08-19 17:08:19,101 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.286e+01 2.525e+01 2.905e+01 4.318e+01, threshold=5.049e+01, percent-clipped=0.0 2024-08-19 17:08:32,940 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 850, loss[loss=0.0823, beats_loss=0.01342, ecapa_loss=0.0001371, whisper_loss=0.06751, over 15957.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01023, ecapa_loss=0.0001409, whisper_loss=0.08848, over 3655255.69 frames. ], batch size: 64, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:08:38,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4454390.0, ans=0.125 2024-08-19 17:08:45,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4454390.0, ans=0.125 2024-08-19 17:09:40,409 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.93 vs. limit=6.0 2024-08-19 17:09:43,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4454790.0, ans=0.0 2024-08-19 17:09:49,946 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 29 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-19 17:09:55,689 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.19 vs. limit=22.5 2024-08-19 17:09:59,476 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 900, loss[loss=0.09361, beats_loss=0.01133, ecapa_loss=0.0001001, whisper_loss=0.08128, over 16250.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01021, ecapa_loss=0.0001396, whisper_loss=0.08875, over 3689104.25 frames. ], batch size: 60, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:10:00,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4454890.0, ans=0.125 2024-08-19 17:10:01,744 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 23 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-19 17:10:12,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4454890.0, ans=0.09899494936611666 2024-08-19 17:10:18,885 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 19 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-19 17:10:24,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4454990.0, ans=0.125 2024-08-19 17:10:40,155 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 23 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 17:10:53,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4455190.0, ans=0.2 2024-08-19 17:10:54,375 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.78 vs. limit=15.0 2024-08-19 17:10:55,496 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 28 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-19 17:11:11,103 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.61 vs. limit=22.5 2024-08-19 17:11:11,413 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.270e+01 2.580e+01 3.222e+01 2.488e+02, threshold=5.161e+01, percent-clipped=3.0 2024-08-19 17:11:17,523 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.88 vs. limit=22.5 2024-08-19 17:11:24,786 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 950, loss[loss=0.09129, beats_loss=0.01014, ecapa_loss=0.0001362, whisper_loss=0.07979, over 18396.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01017, ecapa_loss=0.0001398, whisper_loss=0.08913, over 3712876.94 frames. ], batch size: 71, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:11:42,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4455490.0, ans=0.125 2024-08-19 17:11:48,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4455490.0, ans=0.95 2024-08-19 17:12:05,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4455590.0, ans=0.125 2024-08-19 17:12:11,810 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 18 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-19 17:12:16,215 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.86 vs. limit=10.0 2024-08-19 17:12:19,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4455690.0, ans=0.1 2024-08-19 17:12:22,105 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 24 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-19 17:12:46,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4455790.0, ans=0.125 2024-08-19 17:12:49,078 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 1000, loss[loss=0.09994, beats_loss=0.01044, ecapa_loss=0.0001374, whisper_loss=0.08812, over 22530.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01028, ecapa_loss=0.0001393, whisper_loss=0.08862, over 3714325.93 frames. ], batch size: 92, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:13:10,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4455990.0, ans=0.125 2024-08-19 17:13:25,811 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 17:13:29,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4456090.0, ans=0.015 2024-08-19 17:13:34,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4456090.0, ans=0.1 2024-08-19 17:13:58,452 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 31 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-19 17:13:59,471 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.231e+01 2.578e+01 2.915e+01 8.708e+01, threshold=5.156e+01, percent-clipped=1.0 2024-08-19 17:14:08,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4456290.0, ans=0.125 2024-08-19 17:14:11,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4456390.0, ans=0.1 2024-08-19 17:14:12,621 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 1050, loss[loss=0.1233, beats_loss=0.009371, ecapa_loss=0.0001411, whisper_loss=0.1125, over 23478.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01025, ecapa_loss=0.0001393, whisper_loss=0.08879, over 3718331.23 frames. ], batch size: 93, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:14:18,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4456390.0, ans=0.125 2024-08-19 17:14:18,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4456390.0, ans=0.125 2024-08-19 17:14:23,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4456390.0, ans=0.1 2024-08-19 17:14:28,683 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2024-08-19 17:14:34,310 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.75 vs. limit=15.0 2024-08-19 17:14:58,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4456590.0, ans=0.125 2024-08-19 17:15:11,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4456690.0, ans=0.125 2024-08-19 17:15:29,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4456790.0, ans=0.0 2024-08-19 17:15:31,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4456790.0, ans=0.0 2024-08-19 17:15:34,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4456790.0, ans=0.1 2024-08-19 17:15:37,036 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 1100, loss[loss=0.1056, beats_loss=0.01231, ecapa_loss=0.0001113, whisper_loss=0.09214, over 19907.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01023, ecapa_loss=0.0001393, whisper_loss=0.08917, over 3705474.24 frames. ], batch size: 75, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:15:41,827 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 21 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-19 17:15:48,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4456890.0, ans=0.0 2024-08-19 17:16:31,105 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 23 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-19 17:16:48,541 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.222e+01 2.508e+01 2.804e+01 3.305e+02, threshold=5.015e+01, percent-clipped=2.0 2024-08-19 17:16:57,612 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 22 from LS+wenet, 11 from Vox, 20 fro AS 2024-08-19 17:16:59,308 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 30 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-19 17:17:01,928 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 1150, loss[loss=0.08977, beats_loss=0.00968, ecapa_loss=0.0001078, whisper_loss=0.07901, over 15929.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01016, ecapa_loss=0.000139, whisper_loss=0.08979, over 3697867.50 frames. ], batch size: 60, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:17:17,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4457390.0, ans=0.125 2024-08-19 17:17:32,460 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.38 vs. limit=15.0 2024-08-19 17:17:48,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4457590.0, ans=0.1 2024-08-19 17:18:01,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4457690.0, ans=0.07 2024-08-19 17:18:01,946 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.57 vs. limit=15.0 2024-08-19 17:18:08,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4457790.0, ans=0.125 2024-08-19 17:18:13,292 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.88 vs. limit=15.0 2024-08-19 17:18:18,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4457790.0, ans=0.0 2024-08-19 17:18:26,401 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 1200, loss[loss=0.1037, beats_loss=0.0107, ecapa_loss=0.0001477, whisper_loss=0.09155, over 21183.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01027, ecapa_loss=0.0001398, whisper_loss=0.08957, over 3716048.77 frames. ], batch size: 87, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:18:30,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4457890.0, ans=0.125 2024-08-19 17:18:32,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4457890.0, ans=0.1 2024-08-19 17:18:37,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4457890.0, ans=0.5 2024-08-19 17:18:47,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4457990.0, ans=0.125 2024-08-19 17:19:15,135 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 20 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-19 17:19:17,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4458190.0, ans=0.125 2024-08-19 17:19:21,188 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.59 vs. limit=15.0 2024-08-19 17:19:22,137 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 19 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-19 17:19:36,583 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.677e+01 2.243e+01 2.469e+01 2.791e+01 3.736e+01, threshold=4.938e+01, percent-clipped=0.0 2024-08-19 17:19:41,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4458290.0, ans=0.125 2024-08-19 17:19:50,493 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 1250, loss[loss=0.0891, beats_loss=0.01098, ecapa_loss=0.0001344, whisper_loss=0.07677, over 21014.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01034, ecapa_loss=0.0001385, whisper_loss=0.08992, over 3720410.98 frames. ], batch size: 85, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:20:00,999 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 26 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-19 17:20:04,489 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 17 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 17:20:10,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4458490.0, ans=0.125 2024-08-19 17:20:13,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4458490.0, ans=0.2 2024-08-19 17:20:17,525 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 19 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 17:20:24,697 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.54 vs. limit=22.5 2024-08-19 17:20:40,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4458690.0, ans=0.125 2024-08-19 17:20:44,927 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 28 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-19 17:20:52,375 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-19 17:21:07,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4458790.0, ans=0.125 2024-08-19 17:21:10,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4458790.0, ans=0.0 2024-08-19 17:21:15,039 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 1300, loss[loss=0.1137, beats_loss=0.01043, ecapa_loss=0.0001386, whisper_loss=0.1019, over 22363.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01036, ecapa_loss=0.0001385, whisper_loss=0.08929, over 3741896.86 frames. ], batch size: 87, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:21:20,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4458890.0, ans=0.125 2024-08-19 17:21:36,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4458990.0, ans=0.125 2024-08-19 17:21:55,977 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.42 vs. limit=10.0 2024-08-19 17:22:06,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4459190.0, ans=0.0 2024-08-19 17:22:21,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4459290.0, ans=0.1 2024-08-19 17:22:23,940 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.156e+01 2.325e+01 2.640e+01 4.207e+01, threshold=4.651e+01, percent-clipped=0.0 2024-08-19 17:22:30,155 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 19 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 17:22:37,397 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 1350, loss[loss=0.1061, beats_loss=0.009605, ecapa_loss=0.0001086, whisper_loss=0.09538, over 16286.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01038, ecapa_loss=0.0001389, whisper_loss=0.08926, over 3737538.39 frames. ], batch size: 59, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:22:42,199 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.62 vs. limit=15.0 2024-08-19 17:22:48,624 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.00 vs. limit=15.0 2024-08-19 17:22:50,103 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 17:22:53,741 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2024-08-19 17:23:09,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4459590.0, ans=0.1 2024-08-19 17:23:41,038 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 28 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 17:23:47,942 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 19 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-19 17:23:49,353 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-19 17:23:59,423 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.00 vs. limit=15.0 2024-08-19 17:23:59,858 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 1400, loss[loss=0.1074, beats_loss=0.007558, ecapa_loss=0.0001427, whisper_loss=0.09844, over 15730.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01047, ecapa_loss=0.0001378, whisper_loss=0.08897, over 3743284.93 frames. ], batch size: 58, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:24:05,776 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 23 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-19 17:24:14,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4459890.0, ans=0.125 2024-08-19 17:24:47,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4460090.0, ans=0.125 2024-08-19 17:25:02,496 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.93 vs. limit=22.5 2024-08-19 17:25:11,242 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.246e+01 2.446e+01 2.816e+01 8.915e+01, threshold=4.891e+01, percent-clipped=2.0 2024-08-19 17:25:17,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4460290.0, ans=0.0 2024-08-19 17:25:17,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4460290.0, ans=0.125 2024-08-19 17:25:24,151 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 1450, loss[loss=0.12, beats_loss=0.01058, ecapa_loss=0.0001367, whisper_loss=0.108, over 15459.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01046, ecapa_loss=0.0001378, whisper_loss=0.08947, over 3734914.92 frames. ], batch size: 59, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:25:43,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4460490.0, ans=0.0 2024-08-19 17:25:55,766 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 31 from LS+wenet, 30 from Vox, 24 fro AS 2024-08-19 17:26:00,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4460590.0, ans=0.125 2024-08-19 17:26:24,683 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 21 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-19 17:26:26,456 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 20 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 17:26:30,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4460690.0, ans=0.0 2024-08-19 17:26:46,815 WARNING [optim.py:496] (3/4) Scaling gradients by 0.051883164793252945, model_norm_threshold=48.91460418701172 2024-08-19 17:26:46,982 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.482e+04, grad_sumsq=2.578e+04, orig_rms_sq=3.290e+00 2024-08-19 17:26:53,315 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 1500, loss[loss=0.09506, beats_loss=0.01268, ecapa_loss=0.0001041, whisper_loss=0.08135, over 23366.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01041, ecapa_loss=0.0001378, whisper_loss=0.089, over 3747417.28 frames. ], batch size: 91, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:26:59,242 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 20 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-19 17:27:35,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4461090.0, ans=0.0 2024-08-19 17:27:37,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4461090.0, ans=0.125 2024-08-19 17:27:40,408 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 17:28:06,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4461290.0, ans=0.1 2024-08-19 17:28:09,539 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.242e+01 2.515e+01 2.868e+01 9.428e+02, threshold=5.031e+01, percent-clipped=1.0 2024-08-19 17:28:15,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4461290.0, ans=0.07 2024-08-19 17:28:20,037 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 24 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-19 17:28:22,814 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 1550, loss[loss=0.1016, beats_loss=0.009386, ecapa_loss=0.0001298, whisper_loss=0.09092, over 17609.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.0104, ecapa_loss=0.0001375, whisper_loss=0.08893, over 3761054.97 frames. ], batch size: 69, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:28:29,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4461390.0, ans=0.125 2024-08-19 17:28:30,115 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 20 from LS+wenet, 14 from Vox, 55 fro AS 2024-08-19 17:28:47,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4461490.0, ans=0.1 2024-08-19 17:28:48,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4461490.0, ans=0.2 2024-08-19 17:29:16,511 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 14 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-19 17:29:35,128 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 20 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 17:29:37,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4461790.0, ans=0.0 2024-08-19 17:29:39,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4461790.0, ans=0.0 2024-08-19 17:29:45,894 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 23 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-19 17:29:49,283 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-19 17:29:50,173 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 1600, loss[loss=0.1023, beats_loss=0.009434, ecapa_loss=0.0001462, whisper_loss=0.09145, over 22029.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01032, ecapa_loss=0.0001374, whisper_loss=0.08936, over 3772442.58 frames. ], batch size: 89, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:29:53,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4461890.0, ans=0.125 2024-08-19 17:29:56,546 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.92 vs. limit=22.5 2024-08-19 17:30:00,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4461890.0, ans=0.1 2024-08-19 17:30:23,442 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.08 vs. limit=10.0 2024-08-19 17:30:30,915 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 21 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-19 17:30:34,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4462090.0, ans=0.1 2024-08-19 17:30:50,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4462190.0, ans=0.2 2024-08-19 17:30:57,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4462190.0, ans=0.0 2024-08-19 17:31:02,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4462290.0, ans=0.0 2024-08-19 17:31:03,517 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.591e+01 2.243e+01 2.444e+01 2.645e+01 4.310e+01, threshold=4.888e+01, percent-clipped=0.0 2024-08-19 17:31:17,909 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 1650, loss[loss=0.1016, beats_loss=0.01096, ecapa_loss=0.0001044, whisper_loss=0.08958, over 18544.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01017, ecapa_loss=0.0001376, whisper_loss=0.0898, over 3733628.28 frames. ], batch size: 69, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:31:26,437 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 23 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-19 17:31:41,414 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.06 vs. limit=5.0 2024-08-19 17:32:09,729 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-19 17:32:17,066 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.11 vs. limit=15.0 2024-08-19 17:32:21,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4462690.0, ans=0.125 2024-08-19 17:32:28,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4462790.0, ans=0.1 2024-08-19 17:32:33,306 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-08-19 17:32:38,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4462790.0, ans=0.0 2024-08-19 17:32:43,021 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 1700, loss[loss=0.09832, beats_loss=0.01102, ecapa_loss=0.00013, whisper_loss=0.086, over 20059.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01024, ecapa_loss=0.000137, whisper_loss=0.08985, over 3733763.31 frames. ], batch size: 80, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:32:59,633 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 21 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-19 17:33:13,501 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 15 from LS+wenet, 10 from Vox, 36 fro AS 2024-08-19 17:33:27,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4463090.0, ans=0.125 2024-08-19 17:33:27,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=4463090.0, ans=15.0 2024-08-19 17:33:36,647 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 19 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 17:33:45,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4463190.0, ans=0.2 2024-08-19 17:33:50,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4463290.0, ans=0.0 2024-08-19 17:33:54,683 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.265e+01 2.450e+01 2.804e+01 4.783e+01, threshold=4.900e+01, percent-clipped=0.0 2024-08-19 17:33:57,549 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.92 vs. limit=10.0 2024-08-19 17:33:59,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4463290.0, ans=0.1 2024-08-19 17:34:00,215 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 14 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-19 17:34:02,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4463290.0, ans=0.0 2024-08-19 17:34:03,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=4463290.0, ans=10.0 2024-08-19 17:34:08,179 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 1750, loss[loss=0.09341, beats_loss=0.01047, ecapa_loss=0.0001036, whisper_loss=0.08191, over 23487.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01025, ecapa_loss=0.0001358, whisper_loss=0.08918, over 3740079.26 frames. ], batch size: 91, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:34:23,654 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-19 17:34:31,943 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 21 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-19 17:34:37,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.89 vs. limit=15.0 2024-08-19 17:34:43,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4463590.0, ans=0.125 2024-08-19 17:34:56,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4463590.0, ans=0.2 2024-08-19 17:34:57,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4463690.0, ans=0.0 2024-08-19 17:35:12,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4463690.0, ans=0.125 2024-08-19 17:35:17,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4463790.0, ans=0.0 2024-08-19 17:35:17,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4463790.0, ans=0.05 2024-08-19 17:35:30,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4463890.0, ans=0.07 2024-08-19 17:35:31,774 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 1800, loss[loss=0.09085, beats_loss=0.009596, ecapa_loss=0.0001575, whisper_loss=0.07968, over 17910.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01027, ecapa_loss=0.0001361, whisper_loss=0.0891, over 3752097.03 frames. ], batch size: 74, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:35:37,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=4463890.0, ans=10.0 2024-08-19 17:35:45,470 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=12.0 2024-08-19 17:35:58,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4463990.0, ans=0.125 2024-08-19 17:35:58,311 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 17:36:06,344 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 17:36:11,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4464090.0, ans=0.125 2024-08-19 17:36:38,653 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 33 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-19 17:36:40,216 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.688e+01 2.242e+01 2.530e+01 2.809e+01 4.955e+01, threshold=5.060e+01, percent-clipped=1.0 2024-08-19 17:36:44,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4464290.0, ans=0.0 2024-08-19 17:36:53,845 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 1850, loss[loss=0.1058, beats_loss=0.01022, ecapa_loss=0.0001558, whisper_loss=0.09404, over 21562.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0102, ecapa_loss=0.0001367, whisper_loss=0.08961, over 3709343.67 frames. ], batch size: 90, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:36:55,991 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 22 from LS+wenet, 14 from Vox, 15 fro AS 2024-08-19 17:37:01,728 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2024-08-19 17:37:38,169 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 31 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-19 17:37:53,392 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 24 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 17:38:17,777 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 1900, loss[loss=0.09179, beats_loss=0.01183, ecapa_loss=0.0001206, whisper_loss=0.07875, over 22988.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01025, ecapa_loss=0.000136, whisper_loss=0.08946, over 3757231.48 frames. ], batch size: 92, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:38:45,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4464990.0, ans=0.0 2024-08-19 17:39:16,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4465190.0, ans=0.125 2024-08-19 17:39:20,452 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-19 17:39:28,278 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.273e+01 2.509e+01 2.742e+01 5.984e+01, threshold=5.017e+01, percent-clipped=1.0 2024-08-19 17:39:31,074 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2024-08-19 17:39:36,396 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.17 vs. limit=15.0 2024-08-19 17:39:38,909 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 28 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-19 17:39:41,897 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 1950, loss[loss=0.1012, beats_loss=0.00973, ecapa_loss=0.0001358, whisper_loss=0.09007, over 20636.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.0103, ecapa_loss=0.0001369, whisper_loss=0.08887, over 3723162.42 frames. ], batch size: 80, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:39:58,356 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 30 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-19 17:40:03,358 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 18 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-19 17:40:08,414 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 18 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-19 17:40:10,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4465490.0, ans=0.125 2024-08-19 17:40:32,057 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 15 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-19 17:40:32,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4465690.0, ans=0.125 2024-08-19 17:41:06,980 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.98 vs. limit=15.0 2024-08-19 17:41:07,331 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 2000, loss[loss=0.09717, beats_loss=0.01007, ecapa_loss=0.0002006, whisper_loss=0.08509, over 21133.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.0104, ecapa_loss=0.0001372, whisper_loss=0.08854, over 3741633.54 frames. ], batch size: 95, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:41:08,340 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 19 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 17:41:15,280 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.04 vs. limit=12.0 2024-08-19 17:41:18,373 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=7.013e+01 2024-08-19 17:41:42,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4466090.0, ans=0.125 2024-08-19 17:41:56,136 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.75 vs. limit=15.0 2024-08-19 17:42:06,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4466190.0, ans=0.125 2024-08-19 17:42:14,049 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.54 vs. limit=15.0 2024-08-19 17:42:18,681 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.386e+01 2.592e+01 2.882e+01 2.246e+02, threshold=5.185e+01, percent-clipped=4.0 2024-08-19 17:42:31,613 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 2050, loss[loss=0.07631, beats_loss=0.01157, ecapa_loss=0.0001358, whisper_loss=0.06339, over 14831.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01046, ecapa_loss=0.0001379, whisper_loss=0.08836, over 3717999.64 frames. ], batch size: 61, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:42:49,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4466490.0, ans=0.125 2024-08-19 17:42:59,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4466490.0, ans=0.125 2024-08-19 17:43:05,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4466590.0, ans=0.0 2024-08-19 17:43:11,058 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 17:43:30,086 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 32 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-19 17:43:45,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4466790.0, ans=0.125 2024-08-19 17:43:49,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4466790.0, ans=0.125 2024-08-19 17:43:58,916 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 2100, loss[loss=0.09169, beats_loss=0.01088, ecapa_loss=0.000123, whisper_loss=0.07957, over 20017.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01044, ecapa_loss=0.0001371, whisper_loss=0.0887, over 3711781.41 frames. ], batch size: 80, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:44:19,188 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 21 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-19 17:44:29,294 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.97 vs. limit=10.0 2024-08-19 17:44:40,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4467090.0, ans=0.125 2024-08-19 17:45:10,671 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.698e+01 2.301e+01 2.618e+01 2.880e+01 6.452e+01, threshold=5.236e+01, percent-clipped=1.0 2024-08-19 17:45:20,030 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 15 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-19 17:45:23,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4467390.0, ans=0.2 2024-08-19 17:45:24,072 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 2150, loss[loss=0.08008, beats_loss=0.01368, ecapa_loss=0.0001672, whisper_loss=0.06473, over 15628.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01035, ecapa_loss=0.000137, whisper_loss=0.08907, over 3678260.31 frames. ], batch size: 66, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:45:24,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4467390.0, ans=0.07 2024-08-19 17:45:44,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4467490.0, ans=10.0 2024-08-19 17:45:49,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4467490.0, ans=0.1 2024-08-19 17:45:52,715 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 21 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-19 17:45:59,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=4467590.0, ans=0.1 2024-08-19 17:46:02,790 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-19 17:46:15,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4467690.0, ans=0.0 2024-08-19 17:46:25,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4467690.0, ans=0.0 2024-08-19 17:46:49,316 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-19 17:46:51,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4467890.0, ans=0.125 2024-08-19 17:46:51,891 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 2200, loss[loss=0.1044, beats_loss=0.0114, ecapa_loss=0.000122, whisper_loss=0.09181, over 22833.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0103, ecapa_loss=0.0001391, whisper_loss=0.08957, over 3712227.99 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:46:55,230 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=15.0 2024-08-19 17:47:04,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4467890.0, ans=0.125 2024-08-19 17:47:14,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4467990.0, ans=0.125 2024-08-19 17:48:00,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4468290.0, ans=0.5 2024-08-19 17:48:01,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4468290.0, ans=0.125 2024-08-19 17:48:04,921 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.319e+01 2.611e+01 2.846e+01 3.358e+02, threshold=5.223e+01, percent-clipped=1.0 2024-08-19 17:48:17,620 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 2250, loss[loss=0.1243, beats_loss=0.01122, ecapa_loss=0.0001285, whisper_loss=0.1118, over 22638.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01037, ecapa_loss=0.0001385, whisper_loss=0.08969, over 3705834.81 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:48:25,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4468390.0, ans=0.125 2024-08-19 17:48:35,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4468490.0, ans=0.0 2024-08-19 17:48:35,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4468490.0, ans=0.125 2024-08-19 17:48:41,070 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=12.0 2024-08-19 17:49:04,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4468590.0, ans=0.125 2024-08-19 17:49:17,763 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 18 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-19 17:49:29,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4468790.0, ans=0.1 2024-08-19 17:49:36,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4468790.0, ans=0.2 2024-08-19 17:49:42,747 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 2300, loss[loss=0.1316, beats_loss=0.009098, ecapa_loss=0.0001374, whisper_loss=0.1211, over 23921.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01038, ecapa_loss=0.0001382, whisper_loss=0.09012, over 3702655.57 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:49:47,122 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 35 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 17:50:03,152 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.24 vs. limit=15.0 2024-08-19 17:50:28,241 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.068e+01 2024-08-19 17:50:33,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4469190.0, ans=0.0 2024-08-19 17:50:36,122 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-19 17:50:44,566 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 24 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-19 17:50:53,862 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.63 vs. limit=15.0 2024-08-19 17:50:54,329 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.331e+01 2.598e+01 2.979e+01 4.563e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-19 17:51:05,811 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.05 vs. limit=22.5 2024-08-19 17:51:07,671 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 2350, loss[loss=0.09517, beats_loss=0.0117, ecapa_loss=0.0001312, whisper_loss=0.08215, over 22546.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01041, ecapa_loss=0.0001377, whisper_loss=0.09043, over 3740502.97 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 17:51:15,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4469390.0, ans=0.04949747468305833 2024-08-19 17:51:20,423 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 26 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-19 17:51:35,413 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-19 17:51:35,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4469490.0, ans=0.04949747468305833 2024-08-19 17:51:54,149 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 16 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-19 17:52:09,938 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 20 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-19 17:52:11,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4469690.0, ans=0.125 2024-08-19 17:52:33,341 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 2400, loss[loss=0.09028, beats_loss=0.01186, ecapa_loss=0.0001568, whisper_loss=0.07684, over 22006.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01039, ecapa_loss=0.0001376, whisper_loss=0.09122, over 3749069.98 frames. ], batch size: 93, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 17:52:54,177 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-19 17:52:56,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4469990.0, ans=0.125 2024-08-19 17:53:16,684 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 17 from LS+wenet, 9 from Vox, 32 fro AS 2024-08-19 17:53:16,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4470090.0, ans=0.125 2024-08-19 17:53:19,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4470090.0, ans=0.05 2024-08-19 17:53:46,702 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.352e+01 2.498e+01 2.702e+01 6.582e+01, threshold=4.996e+01, percent-clipped=0.0 2024-08-19 17:53:51,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=4470290.0, ans=0.05 2024-08-19 17:54:01,062 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 2450, loss[loss=0.08571, beats_loss=0.009326, ecapa_loss=0.0001501, whisper_loss=0.07488, over 21944.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01037, ecapa_loss=0.0001378, whisper_loss=0.0909, over 3820872.31 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 17:54:09,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4470390.0, ans=0.1 2024-08-19 17:54:21,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4470490.0, ans=0.125 2024-08-19 17:54:28,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4470490.0, ans=0.125 2024-08-19 17:54:38,904 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 27 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-19 17:54:47,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4470590.0, ans=0.125 2024-08-19 17:54:50,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4470590.0, ans=0.125 2024-08-19 17:54:59,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4470690.0, ans=0.2 2024-08-19 17:55:19,650 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 26 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-19 17:55:27,640 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 17:55:27,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4470890.0, ans=0.125 2024-08-19 17:55:28,494 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 2500, loss[loss=0.1038, beats_loss=0.01046, ecapa_loss=0.0001402, whisper_loss=0.09194, over 22733.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01035, ecapa_loss=0.0001384, whisper_loss=0.09106, over 3862788.09 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 17:55:31,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4470890.0, ans=0.0 2024-08-19 17:55:49,837 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 17 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-19 17:56:02,395 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.98 vs. limit=15.0 2024-08-19 17:56:03,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4471090.0, ans=0.0 2024-08-19 17:56:06,698 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 30 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-19 17:56:17,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4471090.0, ans=0.125 2024-08-19 17:56:25,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4471190.0, ans=0.125 2024-08-19 17:56:40,472 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.734e+01 2.297e+01 2.536e+01 2.855e+01 4.497e+01, threshold=5.072e+01, percent-clipped=1.0 2024-08-19 17:56:42,923 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 30 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-19 17:56:44,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4471290.0, ans=0.125 2024-08-19 17:56:48,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4471290.0, ans=10.0 2024-08-19 17:56:51,345 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 27 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-19 17:56:51,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=4471290.0, ans=0.025 2024-08-19 17:56:54,082 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 2550, loss[loss=0.08038, beats_loss=0.01049, ecapa_loss=0.0001268, whisper_loss=0.06862, over 14002.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01037, ecapa_loss=0.0001377, whisper_loss=0.09111, over 3825100.60 frames. ], batch size: 56, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 17:56:54,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4471390.0, ans=0.0 2024-08-19 17:57:13,353 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 15 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-19 17:57:50,195 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.98 vs. limit=15.0 2024-08-19 17:58:08,481 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 12 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 17:58:19,400 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 2600, loss[loss=0.1159, beats_loss=0.008959, ecapa_loss=0.0001199, whisper_loss=0.1058, over 23091.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01031, ecapa_loss=0.0001381, whisper_loss=0.09126, over 3827867.70 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 17:58:32,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4471890.0, ans=0.0 2024-08-19 17:59:11,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4472190.0, ans=0.04949747468305833 2024-08-19 17:59:33,105 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.692e+01 2.323e+01 2.520e+01 2.771e+01 4.731e+01, threshold=5.040e+01, percent-clipped=0.0 2024-08-19 17:59:43,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4472290.0, ans=0.2 2024-08-19 17:59:45,025 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.70 vs. limit=15.0 2024-08-19 17:59:47,286 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 2650, loss[loss=0.09696, beats_loss=0.01027, ecapa_loss=0.0001203, whisper_loss=0.08549, over 16527.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01028, ecapa_loss=0.0001385, whisper_loss=0.09075, over 3825087.81 frames. ], batch size: 63, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:00:03,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4472390.0, ans=0.0 2024-08-19 18:00:12,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4472490.0, ans=0.1 2024-08-19 18:00:18,677 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 35 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 18:00:21,786 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.52 vs. limit=15.0 2024-08-19 18:00:50,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4472690.0, ans=0.2 2024-08-19 18:01:04,972 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.592e+05 2024-08-19 18:01:18,035 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 2700, loss[loss=0.0879, beats_loss=0.01093, ecapa_loss=0.0001371, whisper_loss=0.07559, over 19285.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01029, ecapa_loss=0.000139, whisper_loss=0.09075, over 3811978.94 frames. ], batch size: 78, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:01:26,685 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.82 vs. limit=15.0 2024-08-19 18:01:31,078 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 15 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-19 18:02:01,585 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.32 vs. limit=22.5 2024-08-19 18:02:25,195 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 22 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-19 18:02:30,791 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.59 vs. limit=22.5 2024-08-19 18:02:31,190 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.406e+01 2.691e+01 2.981e+01 2.904e+02, threshold=5.383e+01, percent-clipped=2.0 2024-08-19 18:02:45,277 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 2750, loss[loss=0.09381, beats_loss=0.00904, ecapa_loss=0.0001468, whisper_loss=0.0833, over 13005.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01033, ecapa_loss=0.0001387, whisper_loss=0.09013, over 3771679.53 frames. ], batch size: 52, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:02:49,470 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 26 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 18:02:50,848 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 20 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-19 18:02:57,231 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 13 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-19 18:03:39,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4473690.0, ans=0.1 2024-08-19 18:03:39,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4473690.0, ans=0.1 2024-08-19 18:04:05,195 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 29 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-19 18:04:05,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4473790.0, ans=0.125 2024-08-19 18:04:10,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4473790.0, ans=0.125 2024-08-19 18:04:13,194 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.27 vs. limit=15.0 2024-08-19 18:04:13,816 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 2800, loss[loss=0.1141, beats_loss=0.007961, ecapa_loss=0.000128, whisper_loss=0.1048, over 15427.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01037, ecapa_loss=0.0001397, whisper_loss=0.08956, over 3743869.05 frames. ], batch size: 55, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:04:15,997 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 18:04:16,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4473890.0, ans=0.125 2024-08-19 18:04:22,702 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-19 18:04:27,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4473890.0, ans=0.1 2024-08-19 18:04:31,305 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.52 vs. limit=15.0 2024-08-19 18:04:43,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4473990.0, ans=0.0 2024-08-19 18:04:50,423 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-19 18:04:59,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4474090.0, ans=0.125 2024-08-19 18:05:01,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4474090.0, ans=0.04949747468305833 2024-08-19 18:05:10,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4474190.0, ans=0.125 2024-08-19 18:05:14,087 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 32 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-19 18:05:25,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4474290.0, ans=0.5 2024-08-19 18:05:28,653 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.228e+01 2.485e+01 2.852e+01 2.973e+02, threshold=4.969e+01, percent-clipped=1.0 2024-08-19 18:05:36,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4474290.0, ans=0.125 2024-08-19 18:05:43,148 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 2850, loss[loss=0.1208, beats_loss=0.009096, ecapa_loss=0.0001465, whisper_loss=0.1103, over 23029.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01035, ecapa_loss=0.0001393, whisper_loss=0.0899, over 3771228.42 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:05:57,733 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 29 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-19 18:06:03,340 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 28 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-19 18:06:16,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4474490.0, ans=0.0 2024-08-19 18:06:17,380 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 20 from LS+wenet, 10 from Vox, 22 fro AS 2024-08-19 18:06:19,212 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 18:06:42,814 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 22 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-19 18:06:51,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4474690.0, ans=0.125 2024-08-19 18:06:56,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4474790.0, ans=0.2 2024-08-19 18:07:00,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4474790.0, ans=0.125 2024-08-19 18:07:11,019 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 2900, loss[loss=0.1004, beats_loss=0.008772, ecapa_loss=0.0001609, whisper_loss=0.09002, over 16975.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01033, ecapa_loss=0.0001404, whisper_loss=0.09063, over 3785426.73 frames. ], batch size: 71, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:07:20,859 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 22 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-19 18:07:27,693 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 24 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-19 18:07:35,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4474990.0, ans=0.125 2024-08-19 18:07:56,855 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 19 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 18:08:27,490 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.549e+01 2.227e+01 2.443e+01 2.748e+01 5.602e+01, threshold=4.887e+01, percent-clipped=1.0 2024-08-19 18:08:38,577 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-19 18:08:41,774 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 2950, loss[loss=0.09067, beats_loss=0.0126, ecapa_loss=0.000102, whisper_loss=0.07705, over 16146.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0103, ecapa_loss=0.0001414, whisper_loss=0.09118, over 3782394.87 frames. ], batch size: 61, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:08:42,453 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 33 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-19 18:08:42,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4475390.0, ans=0.0 2024-08-19 18:08:57,717 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.84 vs. limit=15.0 2024-08-19 18:09:04,328 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 22 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-19 18:09:05,324 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.64 vs. limit=22.5 2024-08-19 18:09:26,827 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-08-19 18:09:34,474 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.90 vs. limit=15.0 2024-08-19 18:09:53,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4475790.0, ans=0.125 2024-08-19 18:10:02,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4475790.0, ans=0.125 2024-08-19 18:10:12,310 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 3000, loss[loss=0.1203, beats_loss=0.006642, ecapa_loss=0.0001745, whisper_loss=0.1119, over 21021.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01024, ecapa_loss=0.0001419, whisper_loss=0.09125, over 3785400.33 frames. ], batch size: 80, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:10:12,310 INFO [train_multi_KD3.py:1140] (3/4) Computing validation loss 2024-08-19 18:10:48,116 INFO [train_multi_KD3.py:1150] (3/4) Epoch 31, validation on ASR_libri: loss=0.2543, beats_loss=0, ecapa_loss=0.0005052, whisper_loss=0.2492, over 931116.00 frames. 2024-08-19 18:11:09,427 INFO [train_multi_KD3.py:1150] (3/4) Epoch 31, validation on SV_voxceleb1: loss=0.003946, beats_loss=0, ecapa_loss=0.0003946, whisper_loss=0, over 944235.00 frames. 2024-08-19 18:12:49,077 INFO [train_multi_KD3.py:1150] (3/4) Epoch 31, validation on AT_audioset: loss=0.02308, beats_loss=0.02308, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 18:12:49,081 INFO [train_multi_KD3.py:1156] (3/4) Maximum memory allocated so far is 32606MB 2024-08-19 18:12:54,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4475890.0, ans=0.2 2024-08-19 18:13:14,224 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.00 vs. limit=15.0 2024-08-19 18:13:33,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4476090.0, ans=0.125 2024-08-19 18:14:01,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4476290.0, ans=0.125 2024-08-19 18:14:02,443 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.325e+01 2.611e+01 2.866e+01 5.886e+01, threshold=5.222e+01, percent-clipped=1.0 2024-08-19 18:14:07,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4476290.0, ans=0.0 2024-08-19 18:14:16,852 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 3050, loss[loss=0.09257, beats_loss=0.01271, ecapa_loss=0.0001288, whisper_loss=0.07856, over 15120.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01033, ecapa_loss=0.0001408, whisper_loss=0.09097, over 3791751.19 frames. ], batch size: 58, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:14:32,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4476390.0, ans=0.0 2024-08-19 18:14:41,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4476490.0, ans=0.125 2024-08-19 18:14:49,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4476490.0, ans=0.125 2024-08-19 18:15:06,550 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 18:15:40,936 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 26 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 18:15:50,663 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 3100, loss[loss=0.09069, beats_loss=0.01287, ecapa_loss=0.000111, whisper_loss=0.07672, over 20026.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0103, ecapa_loss=0.0001412, whisper_loss=0.09191, over 3838995.96 frames. ], batch size: 82, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:15:56,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4476890.0, ans=0.125 2024-08-19 18:16:01,231 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 32 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 18:16:35,209 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 18:16:35,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4477090.0, ans=0.2 2024-08-19 18:16:39,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4477090.0, ans=0.2 2024-08-19 18:16:40,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4477090.0, ans=0.2 2024-08-19 18:16:42,303 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 24 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-19 18:16:47,013 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.35 vs. limit=22.5 2024-08-19 18:16:50,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4477190.0, ans=0.2 2024-08-19 18:16:50,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4477190.0, ans=0.125 2024-08-19 18:16:52,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4477190.0, ans=0.2 2024-08-19 18:16:53,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4477190.0, ans=0.125 2024-08-19 18:16:57,919 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 25 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 18:17:07,073 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.384e+01 2.601e+01 2.875e+01 4.295e+01, threshold=5.202e+01, percent-clipped=0.0 2024-08-19 18:17:22,603 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 3150, loss[loss=0.1153, beats_loss=0.01095, ecapa_loss=0.0001477, whisper_loss=0.1029, over 22164.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01045, ecapa_loss=0.00014, whisper_loss=0.0908, over 3842139.54 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:17:32,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4477390.0, ans=0.0 2024-08-19 18:17:52,711 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.23 vs. limit=15.0 2024-08-19 18:18:04,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4477590.0, ans=0.125 2024-08-19 18:18:27,620 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 24 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-19 18:18:39,610 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.20 vs. limit=10.0 2024-08-19 18:18:50,956 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 25 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-19 18:18:51,751 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.180e-02 2024-08-19 18:18:52,530 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 3200, loss[loss=0.09894, beats_loss=0.009783, ecapa_loss=0.0001588, whisper_loss=0.08757, over 21737.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01044, ecapa_loss=0.0001409, whisper_loss=0.09051, over 3818311.36 frames. ], batch size: 88, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:18:59,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4477890.0, ans=0.125 2024-08-19 18:19:14,623 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 23 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-19 18:19:25,414 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-19 18:19:31,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4478090.0, ans=0.125 2024-08-19 18:19:32,585 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-19 18:19:33,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4478090.0, ans=0.0 2024-08-19 18:19:35,989 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 36 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-19 18:19:39,916 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 33 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-19 18:19:49,006 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 24 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-19 18:20:07,957 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.317e+01 2.495e+01 2.835e+01 3.728e+01, threshold=4.990e+01, percent-clipped=0.0 2024-08-19 18:20:21,894 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 3250, loss[loss=0.1059, beats_loss=0.009956, ecapa_loss=0.0001141, whisper_loss=0.09477, over 21051.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01043, ecapa_loss=0.0001417, whisper_loss=0.09092, over 3846021.90 frames. ], batch size: 81, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:20:24,407 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 18:20:52,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4478490.0, ans=0.1 2024-08-19 18:20:58,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4478590.0, ans=0.125 2024-08-19 18:21:00,453 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 26 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-19 18:21:29,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=4478690.0, ans=15.0 2024-08-19 18:21:31,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4478690.0, ans=0.1 2024-08-19 18:21:54,153 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 3300, loss[loss=0.09982, beats_loss=0.009806, ecapa_loss=0.000162, whisper_loss=0.08839, over 12555.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01042, ecapa_loss=0.0001419, whisper_loss=0.09116, over 3842845.12 frames. ], batch size: 50, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:21:56,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4478890.0, ans=0.125 2024-08-19 18:22:05,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4478890.0, ans=0.125 2024-08-19 18:22:06,027 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.00 vs. limit=15.0 2024-08-19 18:22:07,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4478890.0, ans=0.125 2024-08-19 18:22:13,371 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 15 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-19 18:22:18,693 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 25 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-19 18:22:22,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.78 vs. limit=15.0 2024-08-19 18:22:37,415 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.93 vs. limit=15.0 2024-08-19 18:22:46,799 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 20 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-19 18:22:47,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4479190.0, ans=0.0 2024-08-19 18:22:51,653 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 18:23:07,892 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.244e+01 2.479e+01 2.933e+01 3.930e+01, threshold=4.957e+01, percent-clipped=0.0 2024-08-19 18:23:10,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4479290.0, ans=0.05 2024-08-19 18:23:12,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4479290.0, ans=0.0 2024-08-19 18:23:14,353 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 34 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-19 18:23:23,229 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 3350, loss[loss=0.08098, beats_loss=0.008956, ecapa_loss=0.0001043, whisper_loss=0.07098, over 15027.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01044, ecapa_loss=0.0001426, whisper_loss=0.09103, over 3862541.50 frames. ], batch size: 54, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:23:23,876 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 23 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-19 18:23:55,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4479490.0, ans=0.0 2024-08-19 18:23:59,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4479590.0, ans=0.07 2024-08-19 18:24:09,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4479590.0, ans=0.125 2024-08-19 18:24:12,245 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 22 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 18:24:36,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4479790.0, ans=0.125 2024-08-19 18:24:36,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4479790.0, ans=0.125 2024-08-19 18:24:40,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4479790.0, ans=0.0 2024-08-19 18:24:52,567 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 3400, loss[loss=0.07655, beats_loss=0.01096, ecapa_loss=0.0001712, whisper_loss=0.06387, over 16051.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01037, ecapa_loss=0.0001429, whisper_loss=0.09072, over 3831786.58 frames. ], batch size: 65, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:25:00,565 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 20 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 18:25:29,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4480090.0, ans=0.125 2024-08-19 18:25:30,145 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 36 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 18:25:37,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4480090.0, ans=0.125 2024-08-19 18:25:39,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4480090.0, ans=0.125 2024-08-19 18:25:41,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4480090.0, ans=0.125 2024-08-19 18:25:55,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4480190.0, ans=0.125 2024-08-19 18:26:06,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4480290.0, ans=0.05 2024-08-19 18:26:09,351 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.304e+01 2.534e+01 2.812e+01 7.025e+01, threshold=5.069e+01, percent-clipped=1.0 2024-08-19 18:26:14,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4480290.0, ans=0.125 2024-08-19 18:26:20,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4480290.0, ans=0.125 2024-08-19 18:26:24,789 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 3450, loss[loss=0.08231, beats_loss=0.01185, ecapa_loss=0.0001309, whisper_loss=0.06916, over 12827.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01032, ecapa_loss=0.0001418, whisper_loss=0.09129, over 3852992.73 frames. ], batch size: 52, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:26:34,120 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 17 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-19 18:26:39,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4480390.0, ans=10.0 2024-08-19 18:26:45,967 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 30 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-19 18:26:51,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4480490.0, ans=0.09899494936611666 2024-08-19 18:27:03,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4480590.0, ans=0.1 2024-08-19 18:27:24,836 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 12 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-19 18:27:42,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4480790.0, ans=0.1 2024-08-19 18:27:47,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4480790.0, ans=0.1 2024-08-19 18:27:50,100 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 3500, loss[loss=0.0918, beats_loss=0.01016, ecapa_loss=0.0001335, whisper_loss=0.0803, over 16939.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01036, ecapa_loss=0.0001423, whisper_loss=0.08999, over 3841945.98 frames. ], batch size: 65, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:28:06,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4480990.0, ans=0.125 2024-08-19 18:28:17,297 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 26 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-19 18:28:32,005 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.34 vs. limit=15.0 2024-08-19 18:28:32,956 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 30 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-19 18:28:38,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4481090.0, ans=0.1 2024-08-19 18:28:45,818 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.52 vs. limit=15.0 2024-08-19 18:29:01,329 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.232e+01 2.458e+01 2.911e+01 6.376e+01, threshold=4.915e+01, percent-clipped=2.0 2024-08-19 18:29:05,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4481290.0, ans=0.125 2024-08-19 18:29:14,277 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 3550, loss[loss=0.09894, beats_loss=0.01043, ecapa_loss=0.0001397, whisper_loss=0.08712, over 22130.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0104, ecapa_loss=0.0001416, whisper_loss=0.08948, over 3818612.95 frames. ], batch size: 88, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:29:31,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4481490.0, ans=0.1 2024-08-19 18:29:42,227 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 13 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-19 18:29:52,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4481590.0, ans=0.1 2024-08-19 18:30:05,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4481690.0, ans=0.125 2024-08-19 18:30:16,052 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 29 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-19 18:30:16,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4481690.0, ans=0.09899494936611666 2024-08-19 18:30:34,758 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 3600, loss[loss=0.09348, beats_loss=0.009758, ecapa_loss=0.0001708, whisper_loss=0.08202, over 18160.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01045, ecapa_loss=0.0001414, whisper_loss=0.08887, over 3757151.16 frames. ], batch size: 74, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:30:44,480 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 22 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 18:31:07,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4482090.0, ans=0.125 2024-08-19 18:31:25,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4482190.0, ans=0.0 2024-08-19 18:31:28,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4482190.0, ans=0.0 2024-08-19 18:31:28,916 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.28 vs. limit=22.5 2024-08-19 18:31:34,959 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.35 vs. limit=12.0 2024-08-19 18:31:42,015 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.192e+01 2.433e+01 2.584e+01 3.997e+01, threshold=4.865e+01, percent-clipped=0.0 2024-08-19 18:31:54,783 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 3650, loss[loss=0.1002, beats_loss=0.01195, ecapa_loss=0.0001511, whisper_loss=0.08677, over 19365.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01049, ecapa_loss=0.0001409, whisper_loss=0.08818, over 3765540.67 frames. ], batch size: 82, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:32:02,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4482390.0, ans=0.1 2024-08-19 18:32:05,723 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 19 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-19 18:32:06,728 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.93 vs. limit=15.0 2024-08-19 18:32:26,045 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 34 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-19 18:33:04,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4482790.0, ans=0.125 2024-08-19 18:33:07,479 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2024-08-19 18:33:11,103 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.37 vs. limit=12.0 2024-08-19 18:33:14,975 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 3700, loss[loss=0.09911, beats_loss=0.01001, ecapa_loss=0.0001358, whisper_loss=0.08774, over 18243.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0105, ecapa_loss=0.0001396, whisper_loss=0.08916, over 3791634.97 frames. ], batch size: 73, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:33:16,862 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 18:33:47,940 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 20 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 18:34:27,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4483290.0, ans=0.125 2024-08-19 18:34:30,025 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.652e+01 2.276e+01 2.511e+01 2.757e+01 7.975e+01, threshold=5.022e+01, percent-clipped=3.0 2024-08-19 18:34:34,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4483290.0, ans=0.0 2024-08-19 18:34:38,627 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 24 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-19 18:34:42,966 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 3750, loss[loss=0.09937, beats_loss=0.01111, ecapa_loss=0.0001225, whisper_loss=0.08704, over 17496.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01047, ecapa_loss=0.0001416, whisper_loss=0.08949, over 3783785.30 frames. ], batch size: 69, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:34:58,550 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 33 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-19 18:35:20,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4483590.0, ans=0.0 2024-08-19 18:35:22,429 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 28 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-19 18:35:28,007 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2024-08-19 18:35:32,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4483690.0, ans=0.125 2024-08-19 18:35:32,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4483690.0, ans=0.0 2024-08-19 18:35:41,574 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 21 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-19 18:35:41,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4483690.0, ans=0.04949747468305833 2024-08-19 18:35:48,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4483790.0, ans=0.125 2024-08-19 18:35:49,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4483790.0, ans=0.125 2024-08-19 18:36:03,420 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 3800, loss[loss=0.06557, beats_loss=0.01059, ecapa_loss=0.0001479, whisper_loss=0.0535, over 13806.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01054, ecapa_loss=0.0001406, whisper_loss=0.08903, over 3793140.46 frames. ], batch size: 57, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:36:05,145 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 32 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-19 18:36:17,912 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 19 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 18:36:30,748 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2024-08-19 18:36:41,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4484090.0, ans=0.1 2024-08-19 18:36:43,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4484090.0, ans=0.025 2024-08-19 18:36:43,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4484090.0, ans=0.0 2024-08-19 18:36:49,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4484190.0, ans=0.1 2024-08-19 18:36:51,532 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.06 vs. limit=15.0 2024-08-19 18:36:52,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4484190.0, ans=0.125 2024-08-19 18:36:56,886 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 18:37:07,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4484290.0, ans=0.0 2024-08-19 18:37:09,431 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.312e+01 2.559e+01 2.923e+01 4.060e+01, threshold=5.118e+01, percent-clipped=0.0 2024-08-19 18:37:18,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4484290.0, ans=0.1 2024-08-19 18:37:22,524 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 3850, loss[loss=0.1054, beats_loss=0.009475, ecapa_loss=0.0001669, whisper_loss=0.09429, over 17072.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01045, ecapa_loss=0.0001418, whisper_loss=0.09013, over 3810916.88 frames. ], batch size: 72, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:37:46,005 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2024-08-19 18:37:48,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4484490.0, ans=0.125 2024-08-19 18:38:19,291 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-19 18:38:40,577 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 3900, loss[loss=0.1049, beats_loss=0.01066, ecapa_loss=0.0001258, whisper_loss=0.09302, over 19323.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01049, ecapa_loss=0.0001419, whisper_loss=0.08968, over 3812389.49 frames. ], batch size: 77, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:38:42,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4484890.0, ans=0.2 2024-08-19 18:38:45,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4484890.0, ans=0.125 2024-08-19 18:39:01,516 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-19 18:39:10,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4484990.0, ans=0.1 2024-08-19 18:39:11,959 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-19 18:39:30,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4485190.0, ans=0.125 2024-08-19 18:39:34,363 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.49 vs. limit=15.0 2024-08-19 18:39:36,716 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 23 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 18:39:39,207 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2024-08-19 18:39:47,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4485290.0, ans=0.1 2024-08-19 18:39:47,823 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.325e+01 2.529e+01 2.804e+01 3.948e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-19 18:39:48,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4485290.0, ans=0.0 2024-08-19 18:39:53,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4485290.0, ans=0.125 2024-08-19 18:39:53,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4485290.0, ans=0.125 2024-08-19 18:39:54,144 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 18:40:00,845 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 3950, loss[loss=0.08902, beats_loss=0.01102, ecapa_loss=0.000118, whisper_loss=0.07682, over 15201.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.0001414, whisper_loss=0.08998, over 3835743.65 frames. ], batch size: 59, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:40:24,323 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 18 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 18:40:24,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4485490.0, ans=0.035 2024-08-19 18:40:34,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4485590.0, ans=0.125 2024-08-19 18:40:36,703 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 31 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-19 18:41:04,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4485790.0, ans=0.125 2024-08-19 18:41:04,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4485790.0, ans=0.1 2024-08-19 18:41:17,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4485790.0, ans=0.1 2024-08-19 18:41:19,166 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2024-08-19 18:41:21,227 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-19 18:41:21,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4485890.0, ans=0.0 2024-08-19 18:41:22,596 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 4000, loss[loss=0.09518, beats_loss=0.01215, ecapa_loss=0.0001646, whisper_loss=0.08139, over 22029.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01043, ecapa_loss=0.0001416, whisper_loss=0.09036, over 3830693.49 frames. ], batch size: 93, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:41:26,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4485890.0, ans=0.125 2024-08-19 18:41:58,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4486090.0, ans=0.0 2024-08-19 18:42:06,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4486090.0, ans=0.125 2024-08-19 18:42:22,823 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2024-08-19 18:42:29,814 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.300e+01 2.585e+01 3.012e+01 4.802e+01, threshold=5.170e+01, percent-clipped=0.0 2024-08-19 18:42:31,038 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.90 vs. limit=22.5 2024-08-19 18:42:35,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4486290.0, ans=0.0 2024-08-19 18:42:36,709 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 16 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-19 18:42:42,335 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 4050, loss[loss=0.05726, beats_loss=0.01637, ecapa_loss=0.0001222, whisper_loss=0.03967, over 15048.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01039, ecapa_loss=0.0001414, whisper_loss=0.09102, over 3841221.54 frames. ], batch size: 64, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:43:08,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4486490.0, ans=0.125 2024-08-19 18:43:14,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4486590.0, ans=0.125 2024-08-19 18:43:16,020 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.68 vs. limit=15.0 2024-08-19 18:43:21,034 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 19 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-19 18:43:22,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4486590.0, ans=0.95 2024-08-19 18:43:33,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4486690.0, ans=0.125 2024-08-19 18:43:34,072 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.91 vs. limit=10.0 2024-08-19 18:43:47,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4486790.0, ans=0.1 2024-08-19 18:44:01,538 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 4100, loss[loss=0.08624, beats_loss=0.00913, ecapa_loss=0.0001336, whisper_loss=0.07578, over 15515.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01038, ecapa_loss=0.0001418, whisper_loss=0.0909, over 3860247.20 frames. ], batch size: 61, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:44:07,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4486890.0, ans=0.2 2024-08-19 18:44:08,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=4486890.0, ans=22.5 2024-08-19 18:44:17,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4486990.0, ans=0.1 2024-08-19 18:44:31,198 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 22 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 18:44:33,381 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 18:44:33,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4487090.0, ans=0.0 2024-08-19 18:44:39,300 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 22 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-19 18:44:52,580 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 27 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-19 18:45:07,351 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.401e+01 2.726e+01 3.123e+01 1.504e+02, threshold=5.451e+01, percent-clipped=2.0 2024-08-19 18:45:13,357 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.70 vs. limit=15.0 2024-08-19 18:45:20,297 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 4150, loss[loss=0.1022, beats_loss=0.01162, ecapa_loss=0.0001362, whisper_loss=0.08918, over 20830.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01036, ecapa_loss=0.0001409, whisper_loss=0.0911, over 3877030.13 frames. ], batch size: 83, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:45:44,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4487490.0, ans=0.125 2024-08-19 18:45:44,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4487490.0, ans=0.1 2024-08-19 18:45:46,021 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-19 18:45:47,829 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 16 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-19 18:45:57,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4487590.0, ans=0.0 2024-08-19 18:46:00,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4487590.0, ans=0.07 2024-08-19 18:46:10,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4487690.0, ans=0.125 2024-08-19 18:46:11,470 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 25 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-19 18:46:11,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4487690.0, ans=0.125 2024-08-19 18:46:28,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4487790.0, ans=0.125 2024-08-19 18:46:31,259 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 13 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-19 18:46:33,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4487790.0, ans=0.125 2024-08-19 18:46:40,455 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 4200, loss[loss=0.1221, beats_loss=0.00825, ecapa_loss=0.0001483, whisper_loss=0.1124, over 23174.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01034, ecapa_loss=0.0001415, whisper_loss=0.09094, over 3843394.92 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:46:56,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4487990.0, ans=0.95 2024-08-19 18:47:11,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4487990.0, ans=0.2 2024-08-19 18:47:13,097 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 15 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-19 18:47:23,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4488090.0, ans=0.0 2024-08-19 18:47:29,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4488190.0, ans=0.125 2024-08-19 18:47:31,423 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 32 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-19 18:47:49,579 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.234e+01 2.488e+01 2.803e+01 1.323e+02, threshold=4.977e+01, percent-clipped=2.0 2024-08-19 18:47:56,937 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 23 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-19 18:48:02,447 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 4250, loss[loss=0.1148, beats_loss=0.01047, ecapa_loss=0.0001573, whisper_loss=0.1027, over 21678.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01043, ecapa_loss=0.0001424, whisper_loss=0.09094, over 3862570.67 frames. ], batch size: 88, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:48:06,042 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 18 from LS+wenet, 25 from Vox, 18 fro AS 2024-08-19 18:48:14,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4488390.0, ans=0.0 2024-08-19 18:48:32,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4488490.0, ans=0.125 2024-08-19 18:48:54,472 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-19 18:49:22,719 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 4300, loss[loss=0.1197, beats_loss=0.00988, ecapa_loss=0.0001365, whisper_loss=0.1084, over 23317.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01043, ecapa_loss=0.000142, whisper_loss=0.09069, over 3846494.66 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:50:01,969 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 21 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 18:50:03,231 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.32 vs. limit=10.0 2024-08-19 18:50:17,749 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 18:50:30,850 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.303e+01 2.487e+01 2.877e+01 4.114e+01, threshold=4.973e+01, percent-clipped=0.0 2024-08-19 18:50:31,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4489290.0, ans=0.125 2024-08-19 18:50:38,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4489290.0, ans=0.0 2024-08-19 18:50:43,621 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 4350, loss[loss=0.1151, beats_loss=0.00796, ecapa_loss=0.0001305, whisper_loss=0.1059, over 18107.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01053, ecapa_loss=0.0001405, whisper_loss=0.09018, over 3850030.93 frames. ], batch size: 69, lr: 1.98e-03, grad_scale: 1.152921504606847e+18 2024-08-19 18:50:57,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4489390.0, ans=0.125 2024-08-19 18:51:04,182 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=15.0 2024-08-19 18:51:11,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4489490.0, ans=0.1 2024-08-19 18:51:15,267 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 20 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-19 18:51:37,208 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 26 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-19 18:51:37,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4489690.0, ans=0.0 2024-08-19 18:51:47,844 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 15 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 18:51:51,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4489790.0, ans=0.0 2024-08-19 18:51:52,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4489790.0, ans=0.2 2024-08-19 18:51:54,303 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 15 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 18:51:55,213 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2024-08-19 18:52:03,952 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 4400, loss[loss=0.1086, beats_loss=0.009681, ecapa_loss=0.0001843, whisper_loss=0.09709, over 21983.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01053, ecapa_loss=0.0001411, whisper_loss=0.08964, over 3825338.19 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 1.152921504606847e+18 2024-08-19 18:52:22,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4489990.0, ans=0.0 2024-08-19 18:52:36,531 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 19 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-19 18:53:00,434 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 35 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-19 18:53:11,431 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.271e+01 2.455e+01 2.760e+01 4.090e+01, threshold=4.910e+01, percent-clipped=0.0 2024-08-19 18:53:12,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4490290.0, ans=0.125 2024-08-19 18:53:23,765 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 4450, loss[loss=0.09032, beats_loss=0.01241, ecapa_loss=0.000131, whisper_loss=0.07661, over 21528.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01056, ecapa_loss=0.0001408, whisper_loss=0.09013, over 3826704.16 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 1.152921504606847e+18 2024-08-19 18:53:45,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4490490.0, ans=0.1 2024-08-19 18:53:52,785 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 20 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-19 18:54:10,173 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.32 vs. limit=22.5 2024-08-19 18:54:19,519 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.70 vs. limit=10.0 2024-08-19 18:54:22,985 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=15.0 2024-08-19 18:54:30,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4490790.0, ans=0.0 2024-08-19 18:54:32,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4490790.0, ans=0.1 2024-08-19 18:54:32,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4490790.0, ans=0.1 2024-08-19 18:54:33,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4490790.0, ans=0.2 2024-08-19 18:54:37,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4490790.0, ans=0.125 2024-08-19 18:54:38,539 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 17 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 18:54:41,878 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 14 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-19 18:54:45,076 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 4500, loss[loss=0.1092, beats_loss=0.009325, ecapa_loss=0.0001143, whisper_loss=0.09874, over 19461.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01053, ecapa_loss=0.0001404, whisper_loss=0.08958, over 3792513.90 frames. ], batch size: 73, lr: 1.98e-03, grad_scale: 1.152921504606847e+18 2024-08-19 18:54:47,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4490890.0, ans=0.125 2024-08-19 18:54:55,636 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 18:55:33,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4491090.0, ans=0.125 2024-08-19 18:55:33,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4491090.0, ans=0.125 2024-08-19 18:55:50,869 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 20 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-19 18:55:54,988 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.683e+01 2.263e+01 2.559e+01 2.809e+01 3.466e+01, threshold=5.118e+01, percent-clipped=0.0 2024-08-19 18:56:07,821 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 4550, loss[loss=0.1301, beats_loss=0.00896, ecapa_loss=0.0001409, whisper_loss=0.1198, over 19725.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01058, ecapa_loss=0.0001405, whisper_loss=0.08936, over 3807054.01 frames. ], batch size: 76, lr: 1.98e-03, grad_scale: 1.152921504606847e+18 2024-08-19 18:56:17,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4491390.0, ans=0.125 2024-08-19 18:56:42,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4491590.0, ans=0.125 2024-08-19 18:56:55,875 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 26 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-19 18:57:06,105 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 18:57:19,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4491790.0, ans=0.0 2024-08-19 18:57:21,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4491790.0, ans=0.125 2024-08-19 18:57:33,000 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 4600, loss[loss=0.1035, beats_loss=0.009983, ecapa_loss=0.0001542, whisper_loss=0.09195, over 20121.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01055, ecapa_loss=0.0001405, whisper_loss=0.08956, over 3797296.48 frames. ], batch size: 81, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:57:37,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4491890.0, ans=0.2 2024-08-19 18:57:37,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4491890.0, ans=0.125 2024-08-19 18:57:44,410 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.70 vs. limit=15.0 2024-08-19 18:57:48,803 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 18 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-19 18:58:05,029 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=15.0 2024-08-19 18:58:06,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4492090.0, ans=0.04949747468305833 2024-08-19 18:58:10,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4492090.0, ans=0.1 2024-08-19 18:58:46,007 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.297e+01 2.492e+01 2.828e+01 4.082e+01, threshold=4.984e+01, percent-clipped=0.0 2024-08-19 18:58:47,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4492290.0, ans=0.125 2024-08-19 18:58:47,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4492290.0, ans=0.2 2024-08-19 18:58:57,929 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 4650, loss[loss=0.1046, beats_loss=0.01004, ecapa_loss=0.0001458, whisper_loss=0.0931, over 23262.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01056, ecapa_loss=0.0001406, whisper_loss=0.0894, over 3821929.26 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:59:09,011 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-19 18:59:19,594 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 20 from LS+wenet, 21 from Vox, 12 fro AS 2024-08-19 18:59:24,783 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.72 vs. limit=15.0 2024-08-19 18:59:42,396 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 24 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-19 18:59:42,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4492590.0, ans=0.2 2024-08-19 18:59:46,038 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 20 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-19 18:59:46,664 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.67 vs. limit=10.0 2024-08-19 18:59:51,209 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.33 vs. limit=15.0 2024-08-19 19:00:04,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4492790.0, ans=0.2 2024-08-19 19:00:22,872 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 4700, loss[loss=0.07887, beats_loss=0.01395, ecapa_loss=9.365e-05, whisper_loss=0.06398, over 16121.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001404, whisper_loss=0.08964, over 3816150.85 frames. ], batch size: 63, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:00:23,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4492890.0, ans=0.125 2024-08-19 19:00:45,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4492990.0, ans=0.125 2024-08-19 19:00:47,123 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 19:00:52,006 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 10 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-19 19:00:55,618 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 19:00:59,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4493090.0, ans=0.0 2024-08-19 19:01:05,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4493090.0, ans=0.125 2024-08-19 19:01:15,688 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 23 from LS+wenet, 33 from Vox, 30 fro AS 2024-08-19 19:01:30,362 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 15 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-19 19:01:30,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4493290.0, ans=0.125 2024-08-19 19:01:32,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4493290.0, ans=0.125 2024-08-19 19:01:34,505 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.345e+01 2.552e+01 2.786e+01 4.462e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-19 19:01:35,358 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 28 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-19 19:01:45,844 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 4750, loss[loss=0.08587, beats_loss=0.01275, ecapa_loss=0.0001541, whisper_loss=0.07158, over 14549.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01047, ecapa_loss=0.0001414, whisper_loss=0.0898, over 3816269.69 frames. ], batch size: 63, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:01:50,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4493390.0, ans=0.0 2024-08-19 19:01:53,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4493390.0, ans=0.125 2024-08-19 19:01:53,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4493390.0, ans=0.2 2024-08-19 19:02:41,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4493690.0, ans=0.0 2024-08-19 19:02:41,554 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.46 vs. limit=15.0 2024-08-19 19:02:46,862 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.18 vs. limit=15.0 2024-08-19 19:02:48,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4493690.0, ans=0.09899494936611666 2024-08-19 19:02:52,370 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 28 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 19:03:03,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4493790.0, ans=0.015 2024-08-19 19:03:09,644 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 4800, loss[loss=0.1137, beats_loss=0.009173, ecapa_loss=0.0001283, whisper_loss=0.1032, over 23032.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01046, ecapa_loss=0.0001415, whisper_loss=0.09035, over 3829688.14 frames. ], batch size: 88, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:03:14,558 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 35 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-19 19:03:16,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4493890.0, ans=0.125 2024-08-19 19:03:18,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4493890.0, ans=0.125 2024-08-19 19:03:20,956 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 15 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 19:03:47,187 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.13 vs. limit=22.5 2024-08-19 19:04:02,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4494190.0, ans=0.0 2024-08-19 19:04:21,472 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.336e+01 2.600e+01 2.820e+01 4.344e+01, threshold=5.200e+01, percent-clipped=0.0 2024-08-19 19:04:33,257 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 4850, loss[loss=0.1188, beats_loss=0.008162, ecapa_loss=0.0001532, whisper_loss=0.1091, over 21462.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01045, ecapa_loss=0.0001413, whisper_loss=0.0901, over 3804003.04 frames. ], batch size: 83, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:04:36,543 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 29 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 19:05:07,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4494590.0, ans=0.2 2024-08-19 19:05:56,442 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 4900, loss[loss=0.09973, beats_loss=0.00884, ecapa_loss=0.0001925, whisper_loss=0.08896, over 14104.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01048, ecapa_loss=0.0001411, whisper_loss=0.08973, over 3790429.47 frames. ], batch size: 58, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:06:00,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=4494890.0, ans=0.95 2024-08-19 19:06:04,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4494890.0, ans=0.125 2024-08-19 19:06:27,929 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 26 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-19 19:06:29,024 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 25 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-19 19:06:33,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4495090.0, ans=0.1 2024-08-19 19:06:38,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4495090.0, ans=0.125 2024-08-19 19:06:41,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4495090.0, ans=0.2 2024-08-19 19:06:50,238 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-19 19:06:56,093 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.87 vs. limit=15.0 2024-08-19 19:07:03,783 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.87 vs. limit=12.0 2024-08-19 19:07:04,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4495290.0, ans=0.0 2024-08-19 19:07:10,331 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.336e+01 2.530e+01 2.860e+01 1.367e+02, threshold=5.061e+01, percent-clipped=1.0 2024-08-19 19:07:12,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4495290.0, ans=0.125 2024-08-19 19:07:15,746 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 26 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-19 19:07:22,622 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 4950, loss[loss=0.08228, beats_loss=0.01119, ecapa_loss=0.0001408, whisper_loss=0.06968, over 16623.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01046, ecapa_loss=0.0001409, whisper_loss=0.08923, over 3810697.16 frames. ], batch size: 70, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:07:29,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4495390.0, ans=0.125 2024-08-19 19:07:29,856 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2024-08-19 19:07:34,658 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 27 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-19 19:07:42,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4495490.0, ans=0.0 2024-08-19 19:07:54,137 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2024-08-19 19:08:09,380 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.03 vs. limit=15.0 2024-08-19 19:08:49,526 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 5000, loss[loss=0.125, beats_loss=0.01047, ecapa_loss=0.000149, whisper_loss=0.113, over 15605.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01044, ecapa_loss=0.0001407, whisper_loss=0.09003, over 3805766.63 frames. ], batch size: 61, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:09:09,935 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 30 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-19 19:09:15,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4495990.0, ans=0.125 2024-08-19 19:09:49,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4496190.0, ans=0.125 2024-08-19 19:10:00,147 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 19:10:04,545 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.356e+01 2.547e+01 2.785e+01 7.027e+01, threshold=5.094e+01, percent-clipped=1.0 2024-08-19 19:10:16,960 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 5050, loss[loss=0.08778, beats_loss=0.01196, ecapa_loss=0.0001396, whisper_loss=0.07442, over 21014.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01051, ecapa_loss=0.0001406, whisper_loss=0.08986, over 3832651.36 frames. ], batch size: 84, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:10:20,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4496390.0, ans=0.125 2024-08-19 19:10:22,724 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.47 vs. limit=10.0 2024-08-19 19:10:27,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4496390.0, ans=0.1 2024-08-19 19:10:35,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4496490.0, ans=0.2 2024-08-19 19:10:51,390 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.00 vs. limit=10.0 2024-08-19 19:11:10,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4496690.0, ans=0.2 2024-08-19 19:11:23,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4496790.0, ans=0.2 2024-08-19 19:11:42,233 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 5100, loss[loss=0.1043, beats_loss=0.00947, ecapa_loss=0.0001021, whisper_loss=0.09379, over 14527.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.0001408, whisper_loss=0.09001, over 3794978.14 frames. ], batch size: 51, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:12:12,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4496990.0, ans=0.125 2024-08-19 19:12:24,581 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2024-08-19 19:12:42,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4497190.0, ans=0.0 2024-08-19 19:12:48,030 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-08-19 19:12:53,505 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.293e+01 2.514e+01 2.831e+01 4.907e+01, threshold=5.028e+01, percent-clipped=0.0 2024-08-19 19:13:02,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4497290.0, ans=0.125 2024-08-19 19:13:05,691 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 5150, loss[loss=0.1227, beats_loss=0.007523, ecapa_loss=0.0001736, whisper_loss=0.1134, over 22570.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01053, ecapa_loss=0.0001423, whisper_loss=0.08989, over 3812501.10 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:13:35,973 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 19:13:38,780 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-19 19:13:41,353 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 37 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-19 19:13:52,370 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 25 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-19 19:14:04,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4497690.0, ans=0.1 2024-08-19 19:14:12,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4497690.0, ans=0.125 2024-08-19 19:14:23,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4497790.0, ans=0.1 2024-08-19 19:14:25,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4497790.0, ans=0.125 2024-08-19 19:14:33,137 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 5200, loss[loss=0.1022, beats_loss=0.01257, ecapa_loss=0.0001227, whisper_loss=0.08837, over 16151.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01046, ecapa_loss=0.0001412, whisper_loss=0.09071, over 3831182.19 frames. ], batch size: 62, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:14:35,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4497890.0, ans=0.125 2024-08-19 19:14:47,896 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.88 vs. limit=22.5 2024-08-19 19:14:51,053 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 19 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-19 19:14:54,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4497990.0, ans=0.125 2024-08-19 19:14:59,835 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.48 vs. limit=15.0 2024-08-19 19:15:00,791 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 25 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-19 19:15:16,660 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 19:15:21,041 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 18 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 19:15:36,360 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 30 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-19 19:15:41,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4498290.0, ans=0.125 2024-08-19 19:15:46,205 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.272e+01 2.588e+01 2.852e+01 4.438e+01, threshold=5.176e+01, percent-clipped=0.0 2024-08-19 19:15:51,992 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.555e+00 2024-08-19 19:15:57,706 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 5250, loss[loss=0.1007, beats_loss=0.01088, ecapa_loss=0.0001348, whisper_loss=0.08846, over 18481.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01043, ecapa_loss=0.0001409, whisper_loss=0.091, over 3841768.00 frames. ], batch size: 74, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:15:58,132 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 19 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-19 19:16:16,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4498490.0, ans=0.125 2024-08-19 19:16:28,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4498490.0, ans=0.125 2024-08-19 19:16:33,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4498590.0, ans=0.07 2024-08-19 19:16:46,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4498690.0, ans=0.1 2024-08-19 19:17:01,846 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.10 vs. limit=15.0 2024-08-19 19:17:12,546 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 11 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-19 19:17:17,715 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.456e+00 2024-08-19 19:17:19,626 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 5300, loss[loss=0.1061, beats_loss=0.01289, ecapa_loss=0.0001031, whisper_loss=0.09214, over 23895.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01045, ecapa_loss=0.0001409, whisper_loss=0.09104, over 3838765.65 frames. ], batch size: 93, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:17:25,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4498890.0, ans=0.0 2024-08-19 19:17:34,193 WARNING [optim.py:496] (3/4) Scaling gradients by 0.07655883580446243, model_norm_threshold=51.76279067993164 2024-08-19 19:17:34,360 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.32, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.480e+05, grad_sumsq=1.406e+07, orig_rms_sq=1.053e-02 2024-08-19 19:17:40,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4498990.0, ans=0.125 2024-08-19 19:17:43,020 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 13 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-19 19:17:55,254 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.86 vs. limit=15.0 2024-08-19 19:18:03,965 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-19 19:18:14,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4499190.0, ans=0.125 2024-08-19 19:18:30,310 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.660e+01 2.288e+01 2.528e+01 2.946e+01 6.761e+02, threshold=5.056e+01, percent-clipped=1.0 2024-08-19 19:18:42,122 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 5350, loss[loss=0.09723, beats_loss=0.009971, ecapa_loss=0.0001387, whisper_loss=0.08587, over 20118.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01042, ecapa_loss=0.0001416, whisper_loss=0.09004, over 3800422.61 frames. ], batch size: 81, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:18:57,365 WARNING [optim.py:496] (3/4) Scaling gradients by 0.05797187611460686, model_norm_threshold=50.55705261230469 2024-08-19 19:18:57,536 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.4.encoder.layers.0.norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.406e+05, grad_sumsq=1.406e+05, orig_rms_sq=1.000e+00 2024-08-19 19:18:59,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4499490.0, ans=0.125 2024-08-19 19:18:59,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4499490.0, ans=0.125 2024-08-19 19:19:12,897 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 19:19:16,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4499490.0, ans=0.2 2024-08-19 19:19:22,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4499590.0, ans=0.2 2024-08-19 19:19:25,874 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 19:20:12,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4499890.0, ans=0.125 2024-08-19 19:20:13,377 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 5400, loss[loss=0.09497, beats_loss=0.009183, ecapa_loss=0.0001362, whisper_loss=0.08442, over 19000.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01039, ecapa_loss=0.0001424, whisper_loss=0.08996, over 3821513.60 frames. ], batch size: 75, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:20:19,775 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 20 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-19 19:20:27,018 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.34 vs. limit=22.5 2024-08-19 19:20:29,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4499990.0, ans=0.125 2024-08-19 19:20:33,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4499990.0, ans=0.0 2024-08-19 19:20:43,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4499990.0, ans=0.1 2024-08-19 19:20:43,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4499990.0, ans=0.125 2024-08-19 19:20:47,717 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.78 vs. limit=22.5 2024-08-19 19:20:58,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4500090.0, ans=0.0 2024-08-19 19:21:02,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4500090.0, ans=0.0 2024-08-19 19:21:27,595 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.273e+01 2.608e+01 3.002e+01 8.721e+02, threshold=5.217e+01, percent-clipped=3.0 2024-08-19 19:21:33,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4500290.0, ans=0.05 2024-08-19 19:21:37,861 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 30 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-19 19:21:39,182 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 5450, loss[loss=0.1139, beats_loss=0.008675, ecapa_loss=0.000155, whisper_loss=0.1037, over 21739.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01043, ecapa_loss=0.000141, whisper_loss=0.08944, over 3762327.37 frames. ], batch size: 85, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:21:42,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4500390.0, ans=0.0 2024-08-19 19:21:47,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4500390.0, ans=0.125 2024-08-19 19:21:52,057 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 21 from LS+wenet, 28 from Vox, 44 fro AS 2024-08-19 19:21:56,794 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.68 vs. limit=15.0 2024-08-19 19:22:08,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4500490.0, ans=0.1 2024-08-19 19:22:13,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4500590.0, ans=0.0 2024-08-19 19:22:17,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4500590.0, ans=0.1 2024-08-19 19:22:50,167 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.27 vs. limit=15.0 2024-08-19 19:22:52,786 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 21 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-19 19:22:59,843 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.10 vs. limit=12.0 2024-08-19 19:23:08,984 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 5500, loss[loss=0.09341, beats_loss=0.01064, ecapa_loss=0.0001272, whisper_loss=0.0815, over 19174.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01041, ecapa_loss=0.0001415, whisper_loss=0.08956, over 3750544.69 frames. ], batch size: 73, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:23:11,819 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.47 vs. limit=22.5 2024-08-19 19:23:25,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4500990.0, ans=0.0 2024-08-19 19:23:27,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4500990.0, ans=0.0 2024-08-19 19:24:04,323 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 14 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-19 19:24:25,793 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.236e+01 2.438e+01 2.713e+01 9.093e+01, threshold=4.875e+01, percent-clipped=1.0 2024-08-19 19:24:39,714 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 5550, loss[loss=0.09629, beats_loss=0.01179, ecapa_loss=0.0001195, whisper_loss=0.0833, over 22689.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01042, ecapa_loss=0.000142, whisper_loss=0.08983, over 3785583.52 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:24:49,693 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 40 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-19 19:25:24,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4501590.0, ans=0.125 2024-08-19 19:25:31,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4501590.0, ans=0.125 2024-08-19 19:25:35,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4501590.0, ans=0.125 2024-08-19 19:25:50,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4501690.0, ans=0.0 2024-08-19 19:25:57,166 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-19 19:26:15,407 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 5600, loss[loss=0.09053, beats_loss=0.01057, ecapa_loss=0.0001538, whisper_loss=0.07842, over 19154.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.0001423, whisper_loss=0.09043, over 3834037.88 frames. ], batch size: 78, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:26:19,591 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 29 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-19 19:26:34,687 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 20 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-19 19:27:00,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4502090.0, ans=0.0 2024-08-19 19:27:06,919 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.54 vs. limit=22.5 2024-08-19 19:27:11,506 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 18 from LS+wenet, 9 from Vox, 23 fro AS 2024-08-19 19:27:14,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4502190.0, ans=0.0 2024-08-19 19:27:16,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4502190.0, ans=0.125 2024-08-19 19:27:35,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4502290.0, ans=0.125 2024-08-19 19:27:37,766 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.669e+01 2.290e+01 2.503e+01 2.698e+01 5.557e+01, threshold=5.007e+01, percent-clipped=1.0 2024-08-19 19:27:42,442 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 16 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-19 19:27:51,982 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 5650, loss[loss=0.1072, beats_loss=0.009004, ecapa_loss=0.0001611, whisper_loss=0.09657, over 21872.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01036, ecapa_loss=0.0001433, whisper_loss=0.09076, over 3845543.54 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:28:47,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4502690.0, ans=0.1 2024-08-19 19:29:03,775 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.88 vs. limit=22.5 2024-08-19 19:29:06,403 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 21 from LS+wenet, 30 from Vox, 22 fro AS 2024-08-19 19:29:15,788 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 13 from LS+wenet, 38 from Vox, 33 fro AS 2024-08-19 19:29:27,598 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 5700, loss[loss=0.09001, beats_loss=0.01034, ecapa_loss=0.0001514, whisper_loss=0.07816, over 18244.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01028, ecapa_loss=0.0001447, whisper_loss=0.09052, over 3817000.43 frames. ], batch size: 76, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:29:35,702 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.35 vs. limit=15.0 2024-08-19 19:30:10,378 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 19 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-19 19:30:32,509 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-19 19:30:50,451 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.06 vs. limit=22.5 2024-08-19 19:30:51,118 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.276e+01 2.546e+01 2.979e+01 5.244e+01, threshold=5.092e+01, percent-clipped=1.0 2024-08-19 19:31:04,809 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 5750, loss[loss=0.09096, beats_loss=0.009874, ecapa_loss=0.0001485, whisper_loss=0.0796, over 15947.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01031, ecapa_loss=0.000143, whisper_loss=0.09035, over 3805875.82 frames. ], batch size: 65, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:31:24,012 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 19:31:35,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4503490.0, ans=0.125 2024-08-19 19:31:41,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4503590.0, ans=0.125 2024-08-19 19:31:50,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4503590.0, ans=0.125 2024-08-19 19:32:10,369 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.108e+05 2024-08-19 19:32:11,332 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 22 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-19 19:32:25,188 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-19 19:32:35,821 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 5800, loss[loss=0.07987, beats_loss=0.0135, ecapa_loss=0.0001291, whisper_loss=0.06507, over 18084.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01034, ecapa_loss=0.0001431, whisper_loss=0.09004, over 3829669.36 frames. ], batch size: 75, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:33:05,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4503990.0, ans=0.0 2024-08-19 19:33:10,392 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-08-19 19:33:23,606 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.27 vs. limit=15.0 2024-08-19 19:33:38,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4504190.0, ans=0.2 2024-08-19 19:33:51,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4504290.0, ans=0.125 2024-08-19 19:33:53,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4504290.0, ans=0.0 2024-08-19 19:33:58,185 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.339e+01 2.561e+01 2.956e+01 4.463e+01, threshold=5.121e+01, percent-clipped=0.0 2024-08-19 19:34:06,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4504290.0, ans=0.1 2024-08-19 19:34:11,256 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 5850, loss[loss=0.1151, beats_loss=0.01235, ecapa_loss=0.0001112, whisper_loss=0.1016, over 23252.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01036, ecapa_loss=0.0001424, whisper_loss=0.09033, over 3849898.00 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:34:15,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4504390.0, ans=0.0 2024-08-19 19:34:20,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4504390.0, ans=0.0 2024-08-19 19:34:20,981 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.29 vs. limit=12.0 2024-08-19 19:34:29,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4504490.0, ans=0.125 2024-08-19 19:34:50,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4504590.0, ans=0.1 2024-08-19 19:35:00,187 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 37 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 19:35:12,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4504690.0, ans=0.0 2024-08-19 19:35:21,922 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2024-08-19 19:35:26,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4504790.0, ans=0.0 2024-08-19 19:35:44,446 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 5900, loss[loss=0.09573, beats_loss=0.01191, ecapa_loss=0.0001404, whisper_loss=0.08241, over 22569.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01044, ecapa_loss=0.0001425, whisper_loss=0.08935, over 3819653.39 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:35:44,640 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 17 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-19 19:35:49,224 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.75 vs. limit=15.0 2024-08-19 19:36:25,244 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 29 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-19 19:36:30,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4505090.0, ans=0.125 2024-08-19 19:36:34,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4505090.0, ans=0.0 2024-08-19 19:37:05,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4505290.0, ans=0.0 2024-08-19 19:37:05,445 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.50 vs. limit=15.0 2024-08-19 19:37:06,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4505290.0, ans=0.2 2024-08-19 19:37:09,567 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.259e+01 2.428e+01 2.766e+01 1.765e+02, threshold=4.857e+01, percent-clipped=1.0 2024-08-19 19:37:12,964 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2024-08-19 19:37:14,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4505290.0, ans=0.07 2024-08-19 19:37:23,693 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 5950, loss[loss=0.09147, beats_loss=0.01388, ecapa_loss=0.0001339, whisper_loss=0.07625, over 21582.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01038, ecapa_loss=0.0001425, whisper_loss=0.09003, over 3808156.49 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:37:28,824 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 37 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-19 19:37:47,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4505490.0, ans=0.1 2024-08-19 19:37:59,484 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.77 vs. limit=15.0 2024-08-19 19:38:09,179 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 21 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-19 19:38:12,116 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.69 vs. limit=10.0 2024-08-19 19:38:18,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4505590.0, ans=0.1 2024-08-19 19:38:25,405 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 19:38:39,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4505790.0, ans=0.0 2024-08-19 19:38:41,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4505790.0, ans=0.0 2024-08-19 19:38:43,235 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.35 vs. limit=15.0 2024-08-19 19:38:52,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4505790.0, ans=0.0 2024-08-19 19:38:58,743 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 6000, loss[loss=0.1088, beats_loss=0.01009, ecapa_loss=0.0001299, whisper_loss=0.09736, over 23383.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.0001424, whisper_loss=0.0904, over 3806798.64 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:38:58,743 INFO [train_multi_KD3.py:1140] (3/4) Computing validation loss 2024-08-19 19:39:35,557 INFO [train_multi_KD3.py:1150] (3/4) Epoch 31, validation on ASR_libri: loss=0.254, beats_loss=0, ecapa_loss=0.0005172, whisper_loss=0.2488, over 931116.00 frames. 2024-08-19 19:39:57,481 INFO [train_multi_KD3.py:1150] (3/4) Epoch 31, validation on SV_voxceleb1: loss=0.003973, beats_loss=0, ecapa_loss=0.0003973, whisper_loss=0, over 944235.00 frames. 2024-08-19 19:41:38,245 INFO [train_multi_KD3.py:1150] (3/4) Epoch 31, validation on AT_audioset: loss=0.02294, beats_loss=0.02294, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 19:41:38,248 INFO [train_multi_KD3.py:1156] (3/4) Maximum memory allocated so far is 32606MB 2024-08-19 19:41:40,442 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.86 vs. limit=22.5 2024-08-19 19:42:02,477 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 23 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-19 19:42:22,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=4506090.0, ans=0.2 2024-08-19 19:42:23,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4506090.0, ans=0.2 2024-08-19 19:42:54,024 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 17 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 19:42:55,430 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.381e+01 2.694e+01 2.980e+01 4.120e+01, threshold=5.388e+01, percent-clipped=0.0 2024-08-19 19:43:01,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4506290.0, ans=0.2 2024-08-19 19:43:07,820 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 6050, loss[loss=0.1033, beats_loss=0.008183, ecapa_loss=0.0001484, whisper_loss=0.09367, over 15598.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01041, ecapa_loss=0.0001431, whisper_loss=0.09022, over 3821289.72 frames. ], batch size: 61, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:43:32,281 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.90 vs. limit=15.0 2024-08-19 19:43:43,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4506590.0, ans=0.0 2024-08-19 19:44:32,189 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 23 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 19:44:37,714 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 6100, loss[loss=0.1048, beats_loss=0.009228, ecapa_loss=0.0001372, whisper_loss=0.09421, over 18552.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01051, ecapa_loss=0.0001419, whisper_loss=0.08952, over 3798998.23 frames. ], batch size: 74, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:44:43,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=4506890.0, ans=0.02 2024-08-19 19:44:56,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4506990.0, ans=0.0 2024-08-19 19:45:02,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4506990.0, ans=0.0 2024-08-19 19:45:18,864 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 23 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 19:45:19,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4507090.0, ans=0.125 2024-08-19 19:45:29,413 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 24 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-19 19:45:33,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4507190.0, ans=0.125 2024-08-19 19:45:42,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4507190.0, ans=0.1 2024-08-19 19:45:44,292 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.740e+00 2024-08-19 19:45:52,628 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.726e+01 2.249e+01 2.614e+01 2.889e+01 5.523e+01, threshold=5.228e+01, percent-clipped=1.0 2024-08-19 19:46:07,437 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 6150, loss[loss=0.1118, beats_loss=0.009463, ecapa_loss=0.0001645, whisper_loss=0.1007, over 15718.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01056, ecapa_loss=0.0001413, whisper_loss=0.08999, over 3811354.97 frames. ], batch size: 61, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:46:08,365 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 29 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-19 19:46:22,335 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 15 from LS+wenet, 9 from Vox, 34 fro AS 2024-08-19 19:46:23,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4507390.0, ans=0.0 2024-08-19 19:46:24,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4507490.0, ans=0.1 2024-08-19 19:46:31,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4507490.0, ans=0.2 2024-08-19 19:46:35,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4507490.0, ans=0.1 2024-08-19 19:46:44,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4507590.0, ans=0.125 2024-08-19 19:46:45,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4507590.0, ans=0.125 2024-08-19 19:47:02,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4507690.0, ans=0.125 2024-08-19 19:47:21,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4507790.0, ans=0.0 2024-08-19 19:47:30,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4507790.0, ans=0.125 2024-08-19 19:47:31,562 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 22 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 19:47:36,233 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.39 vs. limit=6.0 2024-08-19 19:47:38,077 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 6200, loss[loss=0.08385, beats_loss=0.01225, ecapa_loss=0.0001573, whisper_loss=0.07002, over 15833.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001411, whisper_loss=0.09014, over 3779604.70 frames. ], batch size: 63, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:47:45,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4507890.0, ans=0.0 2024-08-19 19:47:59,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4507990.0, ans=0.0 2024-08-19 19:48:10,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=4507990.0, ans=10.0 2024-08-19 19:48:11,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4507990.0, ans=0.125 2024-08-19 19:48:24,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4508090.0, ans=0.125 2024-08-19 19:48:30,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4508090.0, ans=0.0 2024-08-19 19:48:41,967 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 24 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 19:48:59,693 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.361e+01 2.656e+01 2.980e+01 4.502e+01, threshold=5.312e+01, percent-clipped=0.0 2024-08-19 19:49:14,606 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 6250, loss[loss=0.08866, beats_loss=0.01494, ecapa_loss=0.0001368, whisper_loss=0.07235, over 17624.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01058, ecapa_loss=0.0001413, whisper_loss=0.08973, over 3774404.80 frames. ], batch size: 72, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:49:16,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4508390.0, ans=0.125 2024-08-19 19:49:25,427 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 19 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-19 19:49:35,865 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 22 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 19:49:41,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4508490.0, ans=0.125 2024-08-19 19:49:51,493 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2024-08-19 19:50:05,403 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 21 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-19 19:50:18,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4508690.0, ans=0.125 2024-08-19 19:50:40,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4508790.0, ans=0.0 2024-08-19 19:50:45,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4508790.0, ans=0.125 2024-08-19 19:50:55,206 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 6300, loss[loss=0.1204, beats_loss=0.007717, ecapa_loss=0.0001583, whisper_loss=0.1111, over 22306.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0105, ecapa_loss=0.0001421, whisper_loss=0.09003, over 3764813.60 frames. ], batch size: 93, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:51:03,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4508890.0, ans=0.0 2024-08-19 19:52:12,232 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.94 vs. limit=6.0 2024-08-19 19:52:20,374 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.295e+01 2.454e+01 2.761e+01 3.903e+01, threshold=4.908e+01, percent-clipped=0.0 2024-08-19 19:52:22,634 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 19:52:27,637 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.63 vs. limit=15.0 2024-08-19 19:52:34,349 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 6350, loss[loss=0.09646, beats_loss=0.01089, ecapa_loss=0.0001325, whisper_loss=0.08424, over 15294.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01051, ecapa_loss=0.0001407, whisper_loss=0.08949, over 3783742.45 frames. ], batch size: 60, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:52:35,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4509390.0, ans=0.125 2024-08-19 19:52:52,054 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 17 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-19 19:53:06,928 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 15 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 19:53:31,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4509590.0, ans=0.1 2024-08-19 19:54:08,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4509790.0, ans=0.125 2024-08-19 19:54:13,967 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 6400, loss[loss=0.1176, beats_loss=0.009052, ecapa_loss=0.0001222, whisper_loss=0.1073, over 24042.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01044, ecapa_loss=0.0001416, whisper_loss=0.0898, over 3792572.93 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:54:26,150 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.48 vs. limit=15.0 2024-08-19 19:54:33,769 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 28 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-19 19:54:42,227 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.47 vs. limit=15.0 2024-08-19 19:54:52,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4509990.0, ans=0.125 2024-08-19 19:54:57,151 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 20 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-19 19:55:06,960 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 28 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 19:55:20,407 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 24 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-19 19:55:23,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4510190.0, ans=0.0 2024-08-19 19:55:29,082 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=15.0 2024-08-19 19:55:35,978 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.24 vs. limit=15.0 2024-08-19 19:55:38,223 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.373e+01 2.627e+01 3.161e+01 1.061e+02, threshold=5.254e+01, percent-clipped=1.0 2024-08-19 19:55:51,593 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 6450, loss[loss=0.1097, beats_loss=0.01166, ecapa_loss=0.0001376, whisper_loss=0.09671, over 22630.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01042, ecapa_loss=0.0001411, whisper_loss=0.09008, over 3793946.74 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:56:09,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4510490.0, ans=0.0 2024-08-19 19:56:30,581 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-19 19:57:09,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4510790.0, ans=0.125 2024-08-19 19:57:15,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4510790.0, ans=0.125 2024-08-19 19:57:16,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4510790.0, ans=0.125 2024-08-19 19:57:18,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4510790.0, ans=0.0 2024-08-19 19:57:28,136 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 6500, loss[loss=0.1027, beats_loss=0.01032, ecapa_loss=0.0001056, whisper_loss=0.09131, over 21625.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01043, ecapa_loss=0.000141, whisper_loss=0.09013, over 3792967.14 frames. ], batch size: 79, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:57:46,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4510990.0, ans=0.125 2024-08-19 19:57:56,926 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.71 vs. limit=15.0 2024-08-19 19:57:57,442 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 34 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 19:57:59,792 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 33 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 19:58:02,996 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-19 19:58:29,434 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.20 vs. limit=15.0 2024-08-19 19:58:36,531 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 31 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 19:58:43,532 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.329e+01 2.544e+01 2.954e+01 4.370e+01, threshold=5.088e+01, percent-clipped=0.0 2024-08-19 19:58:55,530 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 6550, loss[loss=0.1016, beats_loss=0.00982, ecapa_loss=0.0001566, whisper_loss=0.0902, over 20364.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01046, ecapa_loss=0.0001418, whisper_loss=0.09022, over 3852264.67 frames. ], batch size: 84, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:59:02,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4511390.0, ans=0.1 2024-08-19 19:59:28,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4511590.0, ans=0.0 2024-08-19 20:00:06,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4511790.0, ans=0.025 2024-08-19 20:00:10,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4511790.0, ans=0.125 2024-08-19 20:00:15,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4511790.0, ans=0.0 2024-08-19 20:00:21,668 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 6600, loss[loss=0.1078, beats_loss=0.0109, ecapa_loss=0.0001386, whisper_loss=0.09555, over 15364.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01037, ecapa_loss=0.0001412, whisper_loss=0.09172, over 3901282.03 frames. ], batch size: 59, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:00:27,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4511890.0, ans=0.125 2024-08-19 20:00:36,402 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=12.0 2024-08-19 20:01:04,865 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-19 20:01:06,181 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 20:01:09,363 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 20:01:31,509 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.10 vs. limit=15.0 2024-08-19 20:01:34,168 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.409e+01 2.632e+01 2.888e+01 4.355e+02, threshold=5.264e+01, percent-clipped=1.0 2024-08-19 20:01:42,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4512290.0, ans=0.125 2024-08-19 20:01:45,093 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 6650, loss[loss=0.09883, beats_loss=0.01097, ecapa_loss=0.000159, whisper_loss=0.08627, over 18980.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01039, ecapa_loss=0.0001412, whisper_loss=0.09121, over 3870743.40 frames. ], batch size: 78, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:01:55,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4512390.0, ans=0.125 2024-08-19 20:01:57,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4512390.0, ans=0.2 2024-08-19 20:02:17,737 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.33 vs. limit=15.0 2024-08-19 20:02:18,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4512590.0, ans=0.125 2024-08-19 20:02:42,035 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.41 vs. limit=22.5 2024-08-19 20:02:51,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4512790.0, ans=0.1 2024-08-19 20:02:51,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4512790.0, ans=10.0 2024-08-19 20:03:06,968 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 6700, loss[loss=0.1194, beats_loss=0.008624, ecapa_loss=0.0001525, whisper_loss=0.1093, over 20492.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01039, ecapa_loss=0.0001414, whisper_loss=0.09141, over 3904997.02 frames. ], batch size: 81, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:03:08,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4512890.0, ans=0.125 2024-08-19 20:03:20,704 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 20:03:24,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4512990.0, ans=0.1 2024-08-19 20:03:33,248 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.40 vs. limit=15.0 2024-08-19 20:03:36,825 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 28 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-19 20:04:02,953 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2024-08-19 20:04:03,562 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 22 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-19 20:04:20,696 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.359e+01 2.752e+01 3.006e+01 5.924e+01, threshold=5.504e+01, percent-clipped=1.0 2024-08-19 20:04:21,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.79 vs. limit=15.0 2024-08-19 20:04:31,946 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 6750, loss[loss=0.1215, beats_loss=0.009175, ecapa_loss=0.0001467, whisper_loss=0.1108, over 15934.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01035, ecapa_loss=0.0001429, whisper_loss=0.09104, over 3897799.78 frames. ], batch size: 64, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:04:50,669 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 17 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 20:05:00,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4513490.0, ans=0.125 2024-08-19 20:05:01,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4513490.0, ans=0.1 2024-08-19 20:05:03,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4513490.0, ans=0.125 2024-08-19 20:05:05,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=4513590.0, ans=0.1 2024-08-19 20:05:27,874 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-19 20:05:30,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4513690.0, ans=0.0 2024-08-19 20:05:32,726 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 14 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 20:05:38,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=4513790.0, ans=0.02 2024-08-19 20:05:56,271 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 6800, loss[loss=0.106, beats_loss=0.009645, ecapa_loss=0.00016, whisper_loss=0.09478, over 16125.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01033, ecapa_loss=0.0001429, whisper_loss=0.09103, over 3893522.67 frames. ], batch size: 64, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:05:56,531 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 22 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-19 20:06:00,983 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.59 vs. limit=15.0 2024-08-19 20:06:26,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4513990.0, ans=0.125 2024-08-19 20:07:00,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4514290.0, ans=0.125 2024-08-19 20:07:08,253 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.317e+01 2.530e+01 2.822e+01 4.267e+01, threshold=5.060e+01, percent-clipped=0.0 2024-08-19 20:07:14,379 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 25 from LS+wenet, 34 from Vox, 35 fro AS 2024-08-19 20:07:17,728 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 6850, loss[loss=0.09231, beats_loss=0.01417, ecapa_loss=0.0001332, whisper_loss=0.07681, over 20835.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01042, ecapa_loss=0.0001423, whisper_loss=0.09018, over 3853809.05 frames. ], batch size: 87, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:07:22,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4514390.0, ans=0.125 2024-08-19 20:07:23,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4514390.0, ans=0.0 2024-08-19 20:07:26,458 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 36 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-19 20:07:33,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4514490.0, ans=0.1 2024-08-19 20:07:44,562 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-19 20:07:48,958 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.88 vs. limit=12.0 2024-08-19 20:08:10,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4514690.0, ans=0.0 2024-08-19 20:08:11,382 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 21 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-19 20:08:30,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4514790.0, ans=0.125 2024-08-19 20:08:35,139 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.61 vs. limit=15.0 2024-08-19 20:08:40,598 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 6900, loss[loss=0.08228, beats_loss=0.0115, ecapa_loss=0.0001253, whisper_loss=0.06953, over 19171.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01046, ecapa_loss=0.0001425, whisper_loss=0.0898, over 3859353.62 frames. ], batch size: 76, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:08:41,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4514890.0, ans=0.2 2024-08-19 20:08:46,959 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 19 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 20:08:47,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4514890.0, ans=0.95 2024-08-19 20:08:53,266 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.052e+01 2024-08-19 20:09:02,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4514990.0, ans=0.0 2024-08-19 20:09:04,233 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 12 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-19 20:09:17,533 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 20:09:24,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4515090.0, ans=0.125 2024-08-19 20:09:28,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4515190.0, ans=0.2 2024-08-19 20:09:43,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4515290.0, ans=0.125 2024-08-19 20:09:44,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4515290.0, ans=0.2 2024-08-19 20:09:49,785 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.290e+01 2.485e+01 2.784e+01 7.248e+01, threshold=4.970e+01, percent-clipped=1.0 2024-08-19 20:09:54,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4515290.0, ans=0.125 2024-08-19 20:09:59,387 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 6950, loss[loss=0.1087, beats_loss=0.009709, ecapa_loss=0.0001417, whisper_loss=0.09757, over 22943.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.0105, ecapa_loss=0.0001425, whisper_loss=0.08884, over 3866038.27 frames. ], batch size: 93, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:10:05,114 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 24 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-19 20:10:06,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4515390.0, ans=0.1 2024-08-19 20:10:19,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4515490.0, ans=0.0 2024-08-19 20:10:25,562 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 20:10:43,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4515590.0, ans=0.1 2024-08-19 20:10:44,580 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 28 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-19 20:10:48,163 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2024-08-19 20:11:04,294 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 32 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-19 20:11:14,730 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-08-19 20:11:19,792 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 7000, loss[loss=0.1057, beats_loss=0.008478, ecapa_loss=0.0001586, whisper_loss=0.09559, over 21723.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01056, ecapa_loss=0.0001416, whisper_loss=0.08893, over 3866828.30 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:11:23,752 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 28 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-19 20:11:32,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4515890.0, ans=0.1 2024-08-19 20:11:32,720 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.87 vs. limit=15.0 2024-08-19 20:11:37,006 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 21 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-19 20:11:38,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4515990.0, ans=0.125 2024-08-19 20:11:42,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4515990.0, ans=0.2 2024-08-19 20:12:30,254 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.08 vs. limit=22.5 2024-08-19 20:12:32,049 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.294e+01 2.487e+01 2.816e+01 5.941e+01, threshold=4.975e+01, percent-clipped=1.0 2024-08-19 20:12:41,757 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 7050, loss[loss=0.09614, beats_loss=0.01326, ecapa_loss=0.0001082, whisper_loss=0.0818, over 22921.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01059, ecapa_loss=0.00014, whisper_loss=0.08926, over 3865986.44 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:12:58,618 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 24 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-19 20:13:26,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4516590.0, ans=0.125 2024-08-19 20:13:30,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4516590.0, ans=0.2 2024-08-19 20:13:41,568 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.42 vs. limit=22.5 2024-08-19 20:13:56,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4516790.0, ans=0.1 2024-08-19 20:14:08,908 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 7100, loss[loss=0.1086, beats_loss=0.008604, ecapa_loss=0.0001607, whisper_loss=0.09836, over 13925.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01057, ecapa_loss=0.0001406, whisper_loss=0.08973, over 3878629.88 frames. ], batch size: 56, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:14:10,042 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-19 20:14:22,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4516890.0, ans=0.125 2024-08-19 20:14:26,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4516990.0, ans=0.125 2024-08-19 20:14:27,168 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 23 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-19 20:14:41,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4517090.0, ans=0.125 2024-08-19 20:14:45,851 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 22 from LS+wenet, 16 from Vox, 51 fro AS 2024-08-19 20:15:06,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=4517190.0, ans=15.0 2024-08-19 20:15:21,428 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.242e+01 2.446e+01 2.720e+01 3.661e+01, threshold=4.892e+01, percent-clipped=0.0 2024-08-19 20:15:26,068 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.58 vs. limit=15.0 2024-08-19 20:15:29,787 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.58 vs. limit=10.0 2024-08-19 20:15:31,635 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 7150, loss[loss=0.1174, beats_loss=0.01032, ecapa_loss=0.0001344, whisper_loss=0.1057, over 23168.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0105, ecapa_loss=0.0001411, whisper_loss=0.09015, over 3866660.28 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:15:50,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4517490.0, ans=0.125 2024-08-19 20:15:56,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4517490.0, ans=0.125 2024-08-19 20:16:15,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4517590.0, ans=0.1 2024-08-19 20:16:28,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4517690.0, ans=0.125 2024-08-19 20:16:55,017 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 7200, loss[loss=0.08086, beats_loss=0.01193, ecapa_loss=0.000119, whisper_loss=0.06775, over 18572.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01052, ecapa_loss=0.0001413, whisper_loss=0.08974, over 3835881.26 frames. ], batch size: 75, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:16:55,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4517890.0, ans=0.2 2024-08-19 20:17:01,321 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 22 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-19 20:17:10,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4517990.0, ans=0.2 2024-08-19 20:17:18,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4517990.0, ans=0.0 2024-08-19 20:17:24,168 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.03 vs. limit=5.0 2024-08-19 20:17:26,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4518090.0, ans=0.2 2024-08-19 20:17:33,850 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.11 vs. limit=12.0 2024-08-19 20:17:50,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4518190.0, ans=0.1 2024-08-19 20:17:54,661 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 27 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-19 20:17:59,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4518190.0, ans=0.2 2024-08-19 20:18:01,864 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 37 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-19 20:18:08,010 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.204e+01 2.383e+01 2.664e+01 1.113e+02, threshold=4.766e+01, percent-clipped=1.0 2024-08-19 20:18:18,056 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 7250, loss[loss=0.1124, beats_loss=0.009752, ecapa_loss=0.0001276, whisper_loss=0.1014, over 17191.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01043, ecapa_loss=0.0001417, whisper_loss=0.09034, over 3846276.22 frames. ], batch size: 66, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:18:24,507 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 21 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-19 20:18:26,137 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 18 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-19 20:18:36,215 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 17 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-19 20:18:42,051 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.02 vs. limit=15.0 2024-08-19 20:19:26,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4518790.0, ans=0.125 2024-08-19 20:19:32,146 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 18 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-19 20:19:39,702 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 7300, loss[loss=0.1022, beats_loss=0.01315, ecapa_loss=0.0001074, whisper_loss=0.08803, over 22798.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01032, ecapa_loss=0.0001422, whisper_loss=0.09113, over 3820760.18 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:19:58,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4518990.0, ans=0.0 2024-08-19 20:19:59,861 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 29 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 20:20:01,394 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 20:20:02,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4518990.0, ans=0.125 2024-08-19 20:20:14,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4519090.0, ans=0.1 2024-08-19 20:20:17,454 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 28 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-19 20:20:23,266 WARNING [optim.py:496] (3/4) Scaling gradients by 0.09269597381353378, model_norm_threshold=47.66118240356445 2024-08-19 20:20:23,437 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.810e+04, grad_sumsq=3.810e+04, orig_rms_sq=1.000e+00 2024-08-19 20:20:25,821 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 20:20:32,429 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 27 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 20:20:53,571 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.657e+01 2.360e+01 2.664e+01 3.085e+01 5.142e+02, threshold=5.329e+01, percent-clipped=3.0 2024-08-19 20:20:58,755 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-19 20:21:00,836 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 30 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-19 20:21:03,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4519390.0, ans=0.04949747468305833 2024-08-19 20:21:04,148 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 7350, loss[loss=0.106, beats_loss=0.01143, ecapa_loss=0.0001352, whisper_loss=0.09325, over 22637.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01031, ecapa_loss=0.0001422, whisper_loss=0.09131, over 3834164.33 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:21:10,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4519390.0, ans=0.125 2024-08-19 20:21:22,761 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 20:21:32,160 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 25 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-19 20:21:40,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4519590.0, ans=0.2 2024-08-19 20:22:03,126 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 38 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-19 20:22:03,835 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-19 20:22:24,178 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 22 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-19 20:22:32,966 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=15.0 2024-08-19 20:22:34,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4519890.0, ans=0.125 2024-08-19 20:22:35,262 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 7400, loss[loss=0.1319, beats_loss=0.01036, ecapa_loss=0.000138, whisper_loss=0.1202, over 22836.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01031, ecapa_loss=0.0001414, whisper_loss=0.0916, over 3875692.96 frames. ], batch size: 88, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:22:46,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4519890.0, ans=0.125 2024-08-19 20:22:48,743 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 35 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 20:23:08,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4519990.0, ans=0.125 2024-08-19 20:23:14,123 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.35 vs. limit=22.5 2024-08-19 20:23:14,785 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-19 20:23:24,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4520090.0, ans=0.125 2024-08-19 20:23:37,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4520190.0, ans=0.0 2024-08-19 20:23:40,225 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 18 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-19 20:23:44,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4520190.0, ans=0.125 2024-08-19 20:23:53,766 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.575e+01 2.289e+01 2.494e+01 2.789e+01 3.959e+01, threshold=4.988e+01, percent-clipped=0.0 2024-08-19 20:24:04,774 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 7450, loss[loss=0.1107, beats_loss=0.008102, ecapa_loss=0.0002007, whisper_loss=0.1006, over 22859.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01028, ecapa_loss=0.0001411, whisper_loss=0.09206, over 3893095.70 frames. ], batch size: 94, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:24:08,667 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 28 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 20:24:25,512 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.17 vs. limit=10.0 2024-08-19 20:24:37,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4520490.0, ans=0.0 2024-08-19 20:25:01,816 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 23 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-19 20:25:06,161 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.518e-03 2024-08-19 20:25:06,741 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.94 vs. limit=5.0 2024-08-19 20:25:11,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4520690.0, ans=0.125 2024-08-19 20:25:17,902 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 26 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-19 20:25:35,136 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 7500, loss[loss=0.1016, beats_loss=0.01064, ecapa_loss=0.000145, whisper_loss=0.08946, over 23420.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01035, ecapa_loss=0.0001413, whisper_loss=0.09167, over 3902068.98 frames. ], batch size: 95, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:25:37,340 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 25 from LS+wenet, 32 from Vox, 36 fro AS 2024-08-19 20:25:49,990 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 23 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-19 20:25:58,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4520990.0, ans=0.125 2024-08-19 20:26:00,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4520990.0, ans=0.1 2024-08-19 20:26:04,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4520990.0, ans=0.0 2024-08-19 20:26:14,048 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 26 from LS+wenet, 18 from Vox, 15 fro AS 2024-08-19 20:26:32,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4521190.0, ans=0.0 2024-08-19 20:26:41,256 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.14 vs. limit=22.5 2024-08-19 20:26:43,542 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 20 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 20:26:44,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4521190.0, ans=0.0 2024-08-19 20:26:45,764 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 23 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-19 20:26:51,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4521290.0, ans=0.0 2024-08-19 20:26:54,706 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 20:26:56,287 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.301e+01 2.566e+01 2.939e+01 6.434e+01, threshold=5.131e+01, percent-clipped=1.0 2024-08-19 20:27:06,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4521390.0, ans=0.125 2024-08-19 20:27:06,802 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 7550, loss[loss=0.08543, beats_loss=0.01366, ecapa_loss=0.0001285, whisper_loss=0.07049, over 17587.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01037, ecapa_loss=0.0001409, whisper_loss=0.09139, over 3884876.17 frames. ], batch size: 72, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:27:25,443 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.62 vs. limit=15.0 2024-08-19 20:27:40,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4521490.0, ans=0.5 2024-08-19 20:27:57,026 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.10 vs. limit=22.5 2024-08-19 20:28:36,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.16 vs. limit=12.0 2024-08-19 20:28:39,866 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.24 vs. limit=22.5 2024-08-19 20:28:40,259 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 7600, loss[loss=0.1179, beats_loss=0.008947, ecapa_loss=0.0001208, whisper_loss=0.1077, over 15351.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01029, ecapa_loss=0.0001416, whisper_loss=0.09152, over 3889305.98 frames. ], batch size: 58, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:28:52,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=4521890.0, ans=0.1 2024-08-19 20:28:54,645 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 20:28:57,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4521990.0, ans=0.0 2024-08-19 20:29:46,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4522190.0, ans=0.125 2024-08-19 20:29:49,150 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 21 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-19 20:29:52,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4522190.0, ans=0.125 2024-08-19 20:30:01,179 INFO [train_multi_KD3.py:845] (3/4) A total of 49 cuts. 14 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-19 20:30:03,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4522290.0, ans=0.125 2024-08-19 20:30:04,592 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.242e+01 2.534e+01 2.867e+01 1.676e+03, threshold=5.067e+01, percent-clipped=0.0 2024-08-19 20:30:04,592 WARNING [optim.py:496] (3/4) Scaling gradients by 0.030242323875427246, model_norm_threshold=50.67152404785156 2024-08-19 20:30:04,765 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.28, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.977e+05, grad_sumsq=7.564e+07, orig_rms_sq=1.055e-02 2024-08-19 20:30:15,791 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 7650, loss[loss=0.1047, beats_loss=0.009948, ecapa_loss=0.0001423, whisper_loss=0.09331, over 15456.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01024, ecapa_loss=0.0001414, whisper_loss=0.09119, over 3832558.41 frames. ], batch size: 60, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:30:20,024 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 23 from LS+wenet, 13 from Vox, 18 fro AS 2024-08-19 20:30:24,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4522390.0, ans=0.125 2024-08-19 20:30:58,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4522590.0, ans=0.0 2024-08-19 20:31:14,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4522690.0, ans=0.125 2024-08-19 20:31:41,201 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.01 vs. limit=15.0 2024-08-19 20:31:46,314 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 20 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-19 20:31:50,083 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 7700, loss[loss=0.1106, beats_loss=0.01108, ecapa_loss=0.0001057, whisper_loss=0.0985, over 16570.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01026, ecapa_loss=0.0001417, whisper_loss=0.09096, over 3795657.85 frames. ], batch size: 61, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:32:09,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4522990.0, ans=0.0 2024-08-19 20:32:18,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4522990.0, ans=0.0 2024-08-19 20:32:21,124 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 20 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 20:32:29,898 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 28 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-19 20:32:36,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4523090.0, ans=0.125 2024-08-19 20:32:39,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4523090.0, ans=0.2 2024-08-19 20:32:45,789 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 24 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-19 20:33:10,213 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.323e+01 2.517e+01 2.796e+01 4.474e+01, threshold=5.035e+01, percent-clipped=1.0 2024-08-19 20:33:11,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4523290.0, ans=0.0 2024-08-19 20:33:19,416 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 7750, loss[loss=0.08816, beats_loss=0.009752, ecapa_loss=0.0001461, whisper_loss=0.07695, over 22447.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01039, ecapa_loss=0.0001412, whisper_loss=0.08967, over 3791955.86 frames. ], batch size: 93, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:33:19,597 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 23 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-19 20:33:29,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4523390.0, ans=0.0 2024-08-19 20:33:33,688 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 24 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-19 20:33:39,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4523490.0, ans=0.125 2024-08-19 20:33:58,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4523590.0, ans=0.2 2024-08-19 20:34:07,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4523590.0, ans=0.0 2024-08-19 20:34:23,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4523690.0, ans=0.025 2024-08-19 20:34:27,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4523690.0, ans=0.1 2024-08-19 20:34:36,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4523790.0, ans=0.0 2024-08-19 20:34:50,106 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 7800, loss[loss=0.08935, beats_loss=0.01284, ecapa_loss=7.637e-05, whisper_loss=0.07574, over 15324.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01041, ecapa_loss=0.0001408, whisper_loss=0.08949, over 3799222.81 frames. ], batch size: 57, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:35:07,241 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=15.0 2024-08-19 20:35:12,503 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.08 vs. limit=15.0 2024-08-19 20:35:14,820 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 18 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-19 20:35:18,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4523990.0, ans=0.125 2024-08-19 20:35:39,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4524090.0, ans=0.125 2024-08-19 20:35:39,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4524090.0, ans=0.125 2024-08-19 20:35:41,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4524190.0, ans=0.1 2024-08-19 20:35:45,939 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 21 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-19 20:35:57,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4524190.0, ans=0.125 2024-08-19 20:36:06,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4524290.0, ans=0.125 2024-08-19 20:36:09,156 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.228e+01 2.464e+01 2.830e+01 4.593e+01, threshold=4.929e+01, percent-clipped=0.0 2024-08-19 20:36:10,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4524290.0, ans=0.2 2024-08-19 20:36:18,142 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 7850, loss[loss=0.09679, beats_loss=0.00994, ecapa_loss=0.0001513, whisper_loss=0.08534, over 19357.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01045, ecapa_loss=0.0001401, whisper_loss=0.08924, over 3814613.47 frames. ], batch size: 79, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:36:19,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4524390.0, ans=0.0 2024-08-19 20:36:25,211 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 27 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-19 20:36:29,459 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 20:36:46,575 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 24 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 20:36:57,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4524590.0, ans=0.125 2024-08-19 20:37:19,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4524690.0, ans=0.2 2024-08-19 20:37:20,082 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 27 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-19 20:37:20,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4524690.0, ans=0.0 2024-08-19 20:37:42,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4524790.0, ans=0.2 2024-08-19 20:37:46,793 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 7900, loss[loss=0.1086, beats_loss=0.008963, ecapa_loss=0.0001606, whisper_loss=0.098, over 23003.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01042, ecapa_loss=0.0001414, whisper_loss=0.0896, over 3833318.66 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:38:03,618 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.73 vs. limit=10.0 2024-08-19 20:38:38,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4525190.0, ans=0.125 2024-08-19 20:38:43,651 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-19 20:38:45,456 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 15 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-19 20:38:50,333 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 30 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-19 20:39:06,770 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.308e+01 2.634e+01 2.974e+01 4.173e+01, threshold=5.267e+01, percent-clipped=0.0 2024-08-19 20:39:15,486 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 7950, loss[loss=0.1081, beats_loss=0.01012, ecapa_loss=0.0001604, whisper_loss=0.09633, over 22393.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01041, ecapa_loss=0.0001413, whisper_loss=0.08993, over 3843914.67 frames. ], batch size: 92, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:39:31,521 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 14 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-19 20:39:37,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4525490.0, ans=0.2 2024-08-19 20:39:39,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4525490.0, ans=0.0 2024-08-19 20:39:40,177 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 17 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-19 20:40:02,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4525590.0, ans=0.1 2024-08-19 20:40:03,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4525590.0, ans=0.0 2024-08-19 20:40:05,620 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.31 vs. limit=15.0 2024-08-19 20:40:20,323 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 24 from LS+wenet, 33 from Vox, 33 fro AS 2024-08-19 20:40:27,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4525790.0, ans=0.1 2024-08-19 20:40:34,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4525790.0, ans=0.1 2024-08-19 20:40:36,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4525790.0, ans=0.125 2024-08-19 20:40:42,440 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 8000, loss[loss=0.09832, beats_loss=0.01229, ecapa_loss=0.0001233, whisper_loss=0.08479, over 22441.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0105, ecapa_loss=0.0001398, whisper_loss=0.08967, over 3845464.58 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:40:49,235 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.11 vs. limit=12.0 2024-08-19 20:40:50,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4525890.0, ans=0.125 2024-08-19 20:40:52,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4525890.0, ans=0.1 2024-08-19 20:41:00,064 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.40 vs. limit=12.0 2024-08-19 20:41:21,928 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 24 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-19 20:41:26,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4526090.0, ans=0.0 2024-08-19 20:41:28,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4526090.0, ans=0.0 2024-08-19 20:41:28,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4526090.0, ans=0.125 2024-08-19 20:41:36,925 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 19 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-19 20:41:45,804 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 30 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-19 20:41:52,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4526190.0, ans=0.125 2024-08-19 20:41:54,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4526290.0, ans=0.0 2024-08-19 20:42:05,247 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.388e+01 2.587e+01 2.894e+01 4.259e+01, threshold=5.174e+01, percent-clipped=0.0 2024-08-19 20:42:15,379 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 8050, loss[loss=0.08993, beats_loss=0.01144, ecapa_loss=0.0001371, whisper_loss=0.07712, over 18757.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01052, ecapa_loss=0.0001406, whisper_loss=0.08884, over 3810632.11 frames. ], batch size: 78, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:42:20,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4526390.0, ans=0.035 2024-08-19 20:42:37,497 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 14 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 20:42:53,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4526590.0, ans=0.0 2024-08-19 20:43:04,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4526590.0, ans=0.1 2024-08-19 20:43:18,898 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 15 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-19 20:43:40,014 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 28 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-19 20:43:47,684 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 20:43:49,413 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 8100, loss[loss=0.09705, beats_loss=0.009283, ecapa_loss=0.0001257, whisper_loss=0.08651, over 19129.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.0106, ecapa_loss=0.0001402, whisper_loss=0.08865, over 3851112.21 frames. ], batch size: 73, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:43:52,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4526890.0, ans=0.125 2024-08-19 20:43:52,720 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.97 vs. limit=15.0 2024-08-19 20:43:59,998 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=15.0 2024-08-19 20:44:02,063 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.36 vs. limit=12.0 2024-08-19 20:45:02,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=4527190.0, ans=10.0 2024-08-19 20:45:11,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4527290.0, ans=0.0 2024-08-19 20:45:16,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4527290.0, ans=0.125 2024-08-19 20:45:20,507 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.340e+01 2.531e+01 2.954e+01 4.685e+01, threshold=5.063e+01, percent-clipped=0.0 2024-08-19 20:45:25,293 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-19 20:45:30,755 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 8150, loss[loss=0.1152, beats_loss=0.009515, ecapa_loss=0.0001522, whisper_loss=0.1042, over 16308.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01052, ecapa_loss=0.0001409, whisper_loss=0.08958, over 3835163.07 frames. ], batch size: 63, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:45:39,500 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 25 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 20:46:01,141 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 23 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-19 20:46:07,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4527590.0, ans=0.125 2024-08-19 20:46:10,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4527590.0, ans=0.125 2024-08-19 20:46:15,582 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 25 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-19 20:46:30,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4527690.0, ans=0.0 2024-08-19 20:46:33,823 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 30 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-19 20:46:53,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4527790.0, ans=0.125 2024-08-19 20:47:02,169 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 23 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-19 20:47:07,995 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 8200, loss[loss=0.116, beats_loss=0.01073, ecapa_loss=0.0001059, whisper_loss=0.1042, over 24769.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01048, ecapa_loss=0.0001403, whisper_loss=0.09024, over 3854722.87 frames. ], batch size: 92, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:47:24,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4527890.0, ans=0.125 2024-08-19 20:47:30,141 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 22 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-19 20:47:50,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4528090.0, ans=0.1 2024-08-19 20:47:58,070 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=15.0 2024-08-19 20:48:14,254 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 19 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 20:48:35,369 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.270e+01 2.400e+01 2.604e+01 4.192e+01, threshold=4.800e+01, percent-clipped=0.0 2024-08-19 20:48:38,793 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 18 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-19 20:48:40,175 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.84 vs. limit=6.0 2024-08-19 20:48:44,753 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 8250, loss[loss=0.09056, beats_loss=0.01403, ecapa_loss=0.0001008, whisper_loss=0.07551, over 21714.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01048, ecapa_loss=0.0001404, whisper_loss=0.09022, over 3836437.49 frames. ], batch size: 85, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:49:12,888 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 23 from LS+wenet, 32 from Vox, 36 fro AS 2024-08-19 20:49:32,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4528590.0, ans=0.125 2024-08-19 20:49:41,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4528690.0, ans=0.125 2024-08-19 20:49:55,555 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.06 vs. limit=12.0 2024-08-19 20:50:00,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4528690.0, ans=0.125 2024-08-19 20:50:15,512 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 22 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-19 20:50:20,497 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 8300, loss[loss=0.0878, beats_loss=0.01378, ecapa_loss=0.0001306, whisper_loss=0.07272, over 19331.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01048, ecapa_loss=0.0001404, whisper_loss=0.09014, over 3841717.52 frames. ], batch size: 78, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:50:26,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4528890.0, ans=0.1 2024-08-19 20:50:30,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4528890.0, ans=0.1 2024-08-19 20:50:45,746 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 20:50:55,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4529090.0, ans=0.035 2024-08-19 20:51:11,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4529090.0, ans=0.0 2024-08-19 20:51:42,724 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.312e+01 2.525e+01 2.734e+01 6.042e+01, threshold=5.050e+01, percent-clipped=1.0 2024-08-19 20:51:51,665 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 8350, loss[loss=0.1133, beats_loss=0.01048, ecapa_loss=0.0001166, whisper_loss=0.1017, over 18918.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01044, ecapa_loss=0.0001413, whisper_loss=0.08983, over 3859414.17 frames. ], batch size: 72, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:52:08,039 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 29 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-19 20:52:09,488 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 28 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-19 20:52:56,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4529690.0, ans=0.0 2024-08-19 20:53:00,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4529690.0, ans=0.07 2024-08-19 20:53:07,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4529690.0, ans=0.125 2024-08-19 20:53:15,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4529790.0, ans=0.125 2024-08-19 20:53:15,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4529790.0, ans=0.2 2024-08-19 20:53:17,672 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 26 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-19 20:53:29,288 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 8400, loss[loss=0.1074, beats_loss=0.008291, ecapa_loss=0.0001302, whisper_loss=0.09781, over 15434.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01043, ecapa_loss=0.0001417, whisper_loss=0.08962, over 3835408.75 frames. ], batch size: 57, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:53:38,964 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.52 vs. limit=15.0 2024-08-19 20:53:45,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4529990.0, ans=0.125 2024-08-19 20:53:59,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4529990.0, ans=0.125 2024-08-19 20:54:35,288 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.28 vs. limit=15.0 2024-08-19 20:54:35,935 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 26 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-19 20:54:38,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4530290.0, ans=0.2 2024-08-19 20:54:38,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4530290.0, ans=0.1 2024-08-19 20:54:41,590 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 13 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-19 20:54:46,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4530290.0, ans=0.0 2024-08-19 20:54:48,931 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.271e+01 2.577e+01 2.838e+01 4.178e+01, threshold=5.155e+01, percent-clipped=0.0 2024-08-19 20:54:49,098 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 16 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-19 20:54:50,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4530290.0, ans=0.1 2024-08-19 20:54:55,531 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 25 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 20:54:59,408 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 8450, loss[loss=0.1114, beats_loss=0.009555, ecapa_loss=0.0001177, whisper_loss=0.1006, over 23395.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0104, ecapa_loss=0.0001417, whisper_loss=0.0902, over 3824375.00 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:55:09,999 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 21 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 20:55:12,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4530390.0, ans=0.0 2024-08-19 20:55:14,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4530390.0, ans=0.0 2024-08-19 20:55:37,159 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 21 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-19 20:56:04,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=4530690.0, ans=0.02 2024-08-19 20:56:40,131 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 8500, loss[loss=0.09504, beats_loss=0.009315, ecapa_loss=0.0001346, whisper_loss=0.08438, over 18626.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01043, ecapa_loss=0.0001421, whisper_loss=0.08955, over 3809061.59 frames. ], batch size: 74, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:56:43,880 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 16 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-19 20:56:46,190 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 20:56:48,195 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 26 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-19 20:56:53,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4530890.0, ans=0.95 2024-08-19 20:57:02,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4530990.0, ans=0.125 2024-08-19 20:57:45,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4531190.0, ans=0.1 2024-08-19 20:58:12,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4531290.0, ans=0.0 2024-08-19 20:58:13,366 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.342e+01 2.609e+01 2.850e+01 2.704e+02, threshold=5.218e+01, percent-clipped=2.0 2024-08-19 20:58:23,657 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 8550, loss[loss=0.09668, beats_loss=0.01256, ecapa_loss=0.0001369, whisper_loss=0.08275, over 22514.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01038, ecapa_loss=0.0001427, whisper_loss=0.08976, over 3824190.16 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:58:38,479 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-19 20:58:50,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4531490.0, ans=0.125 2024-08-19 20:58:53,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4531490.0, ans=0.0 2024-08-19 20:59:22,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4531690.0, ans=0.0 2024-08-19 20:59:25,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4531690.0, ans=0.125 2024-08-19 20:59:28,938 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 18 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 20:59:59,588 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 8600, loss[loss=0.09383, beats_loss=0.0115, ecapa_loss=0.0001318, whisper_loss=0.08101, over 16762.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01027, ecapa_loss=0.0001427, whisper_loss=0.09068, over 3822007.49 frames. ], batch size: 68, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:00:25,182 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 28 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-19 21:00:48,891 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 24 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-19 21:01:04,646 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.34 vs. limit=15.0 2024-08-19 21:01:14,204 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.81 vs. limit=12.0 2024-08-19 21:01:21,259 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 24 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 21:01:26,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4532290.0, ans=0.125 2024-08-19 21:01:27,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4532290.0, ans=0.0 2024-08-19 21:01:29,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4532290.0, ans=0.125 2024-08-19 21:01:30,131 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.732e+01 2.352e+01 2.546e+01 2.927e+01 4.092e+01, threshold=5.093e+01, percent-clipped=0.0 2024-08-19 21:01:39,170 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 8650, loss[loss=0.1286, beats_loss=0.009233, ecapa_loss=0.0001227, whisper_loss=0.1181, over 19277.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01035, ecapa_loss=0.0001416, whisper_loss=0.09064, over 3823955.62 frames. ], batch size: 72, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:01:40,473 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.85 vs. limit=10.0 2024-08-19 21:02:25,847 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 25 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-19 21:02:27,277 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.77 vs. limit=15.0 2024-08-19 21:02:33,661 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.99 vs. limit=15.0 2024-08-19 21:02:51,515 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 23 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-19 21:02:54,823 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.58 vs. limit=15.0 2024-08-19 21:03:03,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4532790.0, ans=0.125 2024-08-19 21:03:09,304 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 19 from LS+wenet, 19 from Vox, 53 fro AS 2024-08-19 21:03:10,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=4532790.0, ans=0.2 2024-08-19 21:03:14,439 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 8700, loss[loss=0.1131, beats_loss=0.009469, ecapa_loss=0.0001357, whisper_loss=0.1023, over 16210.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0104, ecapa_loss=0.0001405, whisper_loss=0.09065, over 3823122.00 frames. ], batch size: 63, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:03:20,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4532890.0, ans=0.0 2024-08-19 21:03:22,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4532890.0, ans=0.0 2024-08-19 21:03:26,383 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.50 vs. limit=15.0 2024-08-19 21:03:29,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4532890.0, ans=0.125 2024-08-19 21:03:44,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4532990.0, ans=0.0 2024-08-19 21:04:29,103 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 32 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 21:04:34,484 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.263e+01 2.457e+01 2.766e+01 3.922e+01, threshold=4.914e+01, percent-clipped=0.0 2024-08-19 21:04:38,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4533290.0, ans=0.0 2024-08-19 21:04:43,642 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 8750, loss[loss=0.1156, beats_loss=0.008984, ecapa_loss=0.0001109, whisper_loss=0.1055, over 15933.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01033, ecapa_loss=0.0001407, whisper_loss=0.09136, over 3847234.62 frames. ], batch size: 56, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:04:54,301 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 19 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 21:05:00,913 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 16 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 21:05:07,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4533490.0, ans=0.125 2024-08-19 21:05:09,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4533490.0, ans=0.2 2024-08-19 21:05:45,505 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 24 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-19 21:05:58,521 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 21:06:16,987 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 8800, loss[loss=0.1154, beats_loss=0.008697, ecapa_loss=0.0001556, whisper_loss=0.1052, over 16435.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01031, ecapa_loss=0.0001411, whisper_loss=0.09109, over 3816659.47 frames. ], batch size: 64, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:06:31,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4533890.0, ans=0.0 2024-08-19 21:06:37,883 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.10 vs. limit=15.0 2024-08-19 21:06:47,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4533990.0, ans=0.1 2024-08-19 21:06:52,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4534090.0, ans=0.07 2024-08-19 21:07:06,145 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 29 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 21:07:14,668 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 34 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-19 21:07:14,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4534190.0, ans=0.0 2024-08-19 21:07:25,997 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 21 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-19 21:07:33,672 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.239e+01 2.468e+01 2.713e+01 3.674e+01, threshold=4.936e+01, percent-clipped=0.0 2024-08-19 21:07:42,250 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 8850, loss[loss=0.1185, beats_loss=0.009094, ecapa_loss=0.0001551, whisper_loss=0.1079, over 15725.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01038, ecapa_loss=0.0001413, whisper_loss=0.09056, over 3818578.36 frames. ], batch size: 63, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:07:46,221 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-19 21:08:09,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4534490.0, ans=0.1 2024-08-19 21:08:10,597 WARNING [optim.py:496] (3/4) Scaling gradients by 0.05975715443491936, model_norm_threshold=49.35716247558594 2024-08-19 21:08:10,766 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.conv_module1.depthwise_conv.causal_conv.weight with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.671e+04, grad_sumsq=1.079e+05, orig_rms_sq=6.184e-01 2024-08-19 21:08:20,856 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 29 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-19 21:08:37,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4534690.0, ans=0.2 2024-08-19 21:08:40,066 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 37 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-19 21:08:40,582 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.97 vs. limit=15.0 2024-08-19 21:08:57,875 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 16 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 21:09:03,748 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 8900, loss[loss=0.09914, beats_loss=0.009926, ecapa_loss=0.0001488, whisper_loss=0.08772, over 18186.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01045, ecapa_loss=0.0001416, whisper_loss=0.09076, over 3842681.13 frames. ], batch size: 75, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:09:28,656 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 19 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-19 21:09:36,328 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.12 vs. limit=6.0 2024-08-19 21:09:40,581 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 23 from LS+wenet, 8 from Vox, 34 fro AS 2024-08-19 21:09:40,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4535090.0, ans=0.125 2024-08-19 21:09:42,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4535090.0, ans=0.0 2024-08-19 21:09:56,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4535190.0, ans=0.0 2024-08-19 21:09:57,140 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.39 vs. limit=12.0 2024-08-19 21:10:14,512 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.373e+01 2.651e+01 2.938e+01 8.260e+02, threshold=5.301e+01, percent-clipped=3.0 2024-08-19 21:10:22,805 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 8950, loss[loss=0.08707, beats_loss=0.01126, ecapa_loss=0.0001179, whisper_loss=0.07463, over 14285.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01045, ecapa_loss=0.0001411, whisper_loss=0.09055, over 3844312.34 frames. ], batch size: 55, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:10:23,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4535390.0, ans=0.125 2024-08-19 21:10:47,936 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.31 vs. limit=22.5 2024-08-19 21:10:47,992 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.97 vs. limit=6.0 2024-08-19 21:11:10,592 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-19 21:11:26,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4535690.0, ans=0.1 2024-08-19 21:11:33,536 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.11 vs. limit=15.0 2024-08-19 21:11:48,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4535790.0, ans=0.0 2024-08-19 21:11:50,882 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 9000, loss[loss=0.08279, beats_loss=0.01245, ecapa_loss=0.000146, whisper_loss=0.06888, over 21567.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01042, ecapa_loss=0.0001423, whisper_loss=0.0906, over 3841553.04 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:11:50,883 INFO [train_multi_KD3.py:1140] (3/4) Computing validation loss 2024-08-19 21:12:26,935 INFO [train_multi_KD3.py:1150] (3/4) Epoch 31, validation on ASR_libri: loss=0.2531, beats_loss=0, ecapa_loss=0.0005115, whisper_loss=0.248, over 931116.00 frames. 2024-08-19 21:12:49,635 INFO [train_multi_KD3.py:1150] (3/4) Epoch 31, validation on SV_voxceleb1: loss=0.003978, beats_loss=0, ecapa_loss=0.0003978, whisper_loss=0, over 944235.00 frames. 2024-08-19 21:14:25,089 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.6123, 3.2462, 3.7934, 3.6589], device='cuda:3') 2024-08-19 21:14:27,888 INFO [train_multi_KD3.py:1150] (3/4) Epoch 31, validation on AT_audioset: loss=0.02297, beats_loss=0.02297, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 21:14:27,894 INFO [train_multi_KD3.py:1156] (3/4) Maximum memory allocated so far is 32606MB 2024-08-19 21:14:45,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4535990.0, ans=0.1 2024-08-19 21:15:00,391 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 21:15:15,009 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.54 vs. limit=15.0 2024-08-19 21:15:30,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4536190.0, ans=0.125 2024-08-19 21:15:35,895 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 21 from LS+wenet, 9 from Vox, 32 fro AS 2024-08-19 21:15:55,756 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 30 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-19 21:16:01,550 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.332e+01 2.634e+01 2.859e+01 8.780e+01, threshold=5.268e+01, percent-clipped=1.0 2024-08-19 21:16:11,963 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 9050, loss[loss=0.08887, beats_loss=0.008923, ecapa_loss=0.0001543, whisper_loss=0.0784, over 13221.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01049, ecapa_loss=0.0001413, whisper_loss=0.0902, over 3856873.23 frames. ], batch size: 52, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:16:21,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4536390.0, ans=0.125 2024-08-19 21:16:51,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4536590.0, ans=0.05 2024-08-19 21:16:52,228 WARNING [optim.py:496] (3/4) Scaling gradients by 0.07672171294689178, model_norm_threshold=52.675148010253906 2024-08-19 21:16:52,400 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.1.norm.log_scale with proportion 0.26, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.217e+05, grad_sumsq=1.217e+05, orig_rms_sq=1.000e+00 2024-08-19 21:16:53,674 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.03 vs. limit=15.0 2024-08-19 21:16:55,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4536590.0, ans=0.0 2024-08-19 21:17:06,563 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-19 21:17:13,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4536690.0, ans=0.1 2024-08-19 21:17:24,270 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 35 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-19 21:17:26,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4536690.0, ans=0.09899494936611666 2024-08-19 21:17:34,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4536790.0, ans=0.0 2024-08-19 21:17:35,526 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 25 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-19 21:17:40,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4536790.0, ans=0.1 2024-08-19 21:17:49,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4536790.0, ans=0.125 2024-08-19 21:17:51,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4536890.0, ans=0.09899494936611666 2024-08-19 21:17:52,597 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 9100, loss[loss=0.08632, beats_loss=0.009147, ecapa_loss=0.0001335, whisper_loss=0.07584, over 15143.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01052, ecapa_loss=0.0001406, whisper_loss=0.08958, over 3836152.33 frames. ], batch size: 59, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:18:10,932 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 21:18:34,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4537090.0, ans=0.125 2024-08-19 21:18:44,449 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 28 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-19 21:18:50,853 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 11 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 21:18:51,623 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.74 vs. limit=15.0 2024-08-19 21:18:53,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4537190.0, ans=0.0 2024-08-19 21:18:53,416 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.00 vs. limit=12.0 2024-08-19 21:18:58,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4537190.0, ans=0.0 2024-08-19 21:19:03,416 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 12 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 21:19:18,577 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 18 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-19 21:19:22,092 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.03 vs. limit=10.0 2024-08-19 21:19:24,565 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.552e+01 2.187e+01 2.511e+01 2.857e+01 6.866e+02, threshold=5.022e+01, percent-clipped=2.0 2024-08-19 21:19:26,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4537290.0, ans=0.2 2024-08-19 21:19:27,996 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 27 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-19 21:19:34,244 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 9150, loss[loss=0.1228, beats_loss=0.007514, ecapa_loss=0.0001398, whisper_loss=0.1139, over 18228.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01052, ecapa_loss=0.0001403, whisper_loss=0.09014, over 3845455.81 frames. ], batch size: 70, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:19:47,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4537390.0, ans=0.1 2024-08-19 21:19:56,573 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 23 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-19 21:20:07,757 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 21:20:28,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4537590.0, ans=10.0 2024-08-19 21:20:50,389 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 15 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-19 21:20:57,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4537790.0, ans=0.05 2024-08-19 21:21:09,829 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 9200, loss[loss=0.08803, beats_loss=0.01176, ecapa_loss=0.0001261, whisper_loss=0.07501, over 14177.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01053, ecapa_loss=0.0001412, whisper_loss=0.08976, over 3821013.27 frames. ], batch size: 57, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:21:24,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4537890.0, ans=0.2 2024-08-19 21:21:34,739 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 21 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-19 21:21:44,032 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.43 vs. limit=22.5 2024-08-19 21:22:06,906 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-19 21:22:25,454 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 13 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-19 21:22:39,569 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.332e+01 2.620e+01 2.969e+01 6.711e+01, threshold=5.240e+01, percent-clipped=2.0 2024-08-19 21:22:49,874 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 9250, loss[loss=0.1246, beats_loss=0.009252, ecapa_loss=0.0001708, whisper_loss=0.1137, over 18603.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01044, ecapa_loss=0.0001413, whisper_loss=0.09053, over 3838521.42 frames. ], batch size: 76, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:22:55,592 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.18 vs. limit=15.0 2024-08-19 21:22:57,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4538390.0, ans=0.125 2024-08-19 21:23:03,025 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.91 vs. limit=5.0 2024-08-19 21:23:18,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4538490.0, ans=0.125 2024-08-19 21:23:28,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4538590.0, ans=0.1 2024-08-19 21:23:37,422 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 25 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-19 21:23:50,510 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 21 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 21:24:25,339 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 9300, loss[loss=0.1011, beats_loss=0.009853, ecapa_loss=0.000124, whisper_loss=0.09005, over 23876.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01038, ecapa_loss=0.0001412, whisper_loss=0.09083, over 3843373.82 frames. ], batch size: 96, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:24:36,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4538890.0, ans=0.1 2024-08-19 21:24:43,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4538990.0, ans=0.125 2024-08-19 21:25:02,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4539090.0, ans=0.125 2024-08-19 21:25:28,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4539190.0, ans=0.0 2024-08-19 21:25:36,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4539190.0, ans=0.0 2024-08-19 21:25:50,823 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.320e+01 2.550e+01 2.886e+01 6.142e+01, threshold=5.100e+01, percent-clipped=1.0 2024-08-19 21:26:00,075 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 9350, loss[loss=0.09465, beats_loss=0.01124, ecapa_loss=0.0001494, whisper_loss=0.08192, over 20528.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.0001413, whisper_loss=0.0904, over 3863034.37 frames. ], batch size: 82, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:26:23,357 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2024-08-19 21:26:41,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4539590.0, ans=0.125 2024-08-19 21:27:01,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4539690.0, ans=0.125 2024-08-19 21:27:33,481 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 9400, loss[loss=0.0878, beats_loss=0.01135, ecapa_loss=0.0001345, whisper_loss=0.0751, over 15203.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01042, ecapa_loss=0.0001407, whisper_loss=0.09032, over 3851249.47 frames. ], batch size: 62, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:27:50,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4539890.0, ans=0.0 2024-08-19 21:27:55,963 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 35 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-19 21:28:00,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4539990.0, ans=0.0 2024-08-19 21:28:06,593 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.127e+01 2024-08-19 21:28:09,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4540090.0, ans=0.125 2024-08-19 21:28:14,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4540090.0, ans=0.125 2024-08-19 21:28:22,964 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.82 vs. limit=15.0 2024-08-19 21:28:36,484 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 23 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-19 21:28:42,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4540190.0, ans=0.1 2024-08-19 21:28:45,189 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.46 vs. limit=15.0 2024-08-19 21:28:53,235 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 16 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-19 21:28:56,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4540290.0, ans=0.125 2024-08-19 21:28:57,133 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.227e+01 2.494e+01 2.747e+01 6.860e+01, threshold=4.987e+01, percent-clipped=1.0 2024-08-19 21:29:00,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4540290.0, ans=0.125 2024-08-19 21:29:07,226 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 9450, loss[loss=0.08782, beats_loss=0.0119, ecapa_loss=0.0001738, whisper_loss=0.07419, over 17149.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01046, ecapa_loss=0.0001422, whisper_loss=0.08966, over 3848164.93 frames. ], batch size: 75, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:29:07,660 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 21:29:15,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4540390.0, ans=0.0 2024-08-19 21:29:30,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4540490.0, ans=0.2 2024-08-19 21:29:46,608 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.79 vs. limit=15.0 2024-08-19 21:29:53,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4540590.0, ans=0.125 2024-08-19 21:30:00,402 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=15.0 2024-08-19 21:30:18,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4540690.0, ans=0.125 2024-08-19 21:30:48,111 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 9500, loss[loss=0.11, beats_loss=0.008276, ecapa_loss=0.000199, whisper_loss=0.09971, over 12655.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01037, ecapa_loss=0.000142, whisper_loss=0.09061, over 3839803.53 frames. ], batch size: 51, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:30:56,115 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 25 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-19 21:30:58,441 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 19 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 21:31:11,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4540990.0, ans=0.1 2024-08-19 21:31:12,035 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 18 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-19 21:31:33,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4541090.0, ans=0.125 2024-08-19 21:31:42,226 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 25 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-19 21:31:51,041 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-19 21:32:05,790 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 27 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 21:32:17,489 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.722e+01 2.220e+01 2.482e+01 2.741e+01 4.040e+01, threshold=4.965e+01, percent-clipped=0.0 2024-08-19 21:32:22,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4541290.0, ans=0.125 2024-08-19 21:32:25,246 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2024-08-19 21:32:26,005 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 23 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-19 21:32:26,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4541390.0, ans=0.125 2024-08-19 21:32:27,365 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 9550, loss[loss=0.1079, beats_loss=0.009058, ecapa_loss=0.0001255, whisper_loss=0.09755, over 16938.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01038, ecapa_loss=0.0001409, whisper_loss=0.09056, over 3822783.03 frames. ], batch size: 64, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:32:32,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4541390.0, ans=0.0 2024-08-19 21:32:36,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4541390.0, ans=0.125 2024-08-19 21:32:50,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=4541490.0, ans=0.025 2024-08-19 21:32:55,178 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=22.5 2024-08-19 21:33:10,701 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 27 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-19 21:33:14,357 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 21:33:22,073 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 28 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-19 21:33:38,950 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 32 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-19 21:33:51,545 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-08-19 21:33:58,037 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 20 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-19 21:34:02,523 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2024-08-19 21:34:02,974 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 9600, loss[loss=0.1108, beats_loss=0.008861, ecapa_loss=0.0001352, whisper_loss=0.1006, over 17030.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01031, ecapa_loss=0.0001414, whisper_loss=0.09084, over 3820845.81 frames. ], batch size: 67, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:34:18,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4541890.0, ans=0.1 2024-08-19 21:34:34,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4541990.0, ans=0.125 2024-08-19 21:35:07,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4542190.0, ans=0.0 2024-08-19 21:35:08,678 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 24 from LS+wenet, 35 from Vox, 33 fro AS 2024-08-19 21:35:27,590 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-19 21:35:31,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4542290.0, ans=0.0 2024-08-19 21:35:34,829 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.267e+01 2.509e+01 2.746e+01 4.719e+01, threshold=5.019e+01, percent-clipped=0.0 2024-08-19 21:35:45,885 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 9650, loss[loss=0.1005, beats_loss=0.01088, ecapa_loss=0.0001078, whisper_loss=0.08852, over 19696.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01026, ecapa_loss=0.0001416, whisper_loss=0.0909, over 3820764.61 frames. ], batch size: 73, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:35:47,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4542390.0, ans=0.0 2024-08-19 21:36:07,410 INFO [train_multi_KD3.py:845] (3/4) A total of 49 cuts. 22 from LS+wenet, 11 from Vox, 16 fro AS 2024-08-19 21:36:24,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4542490.0, ans=0.125 2024-08-19 21:36:26,115 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.58 vs. limit=15.0 2024-08-19 21:36:28,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4542590.0, ans=0.0 2024-08-19 21:36:35,604 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=12.0 2024-08-19 21:36:39,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4542590.0, ans=0.1 2024-08-19 21:36:47,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4542690.0, ans=0.1 2024-08-19 21:36:50,567 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 21:37:29,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4542890.0, ans=0.1 2024-08-19 21:37:29,853 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 9700, loss[loss=0.1046, beats_loss=0.01024, ecapa_loss=0.0001529, whisper_loss=0.09279, over 20257.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01056, ecapa_loss=0.0001408, whisper_loss=0.09083, over 3811616.57 frames. ], batch size: 83, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:37:36,749 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 16 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-19 21:37:44,769 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 21:37:47,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4542890.0, ans=0.125 2024-08-19 21:37:48,790 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 21 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-19 21:38:03,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4542990.0, ans=0.07 2024-08-19 21:38:33,048 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.60 vs. limit=15.0 2024-08-19 21:38:41,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4543190.0, ans=0.5 2024-08-19 21:39:02,746 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.331e+01 2.511e+01 2.867e+01 4.334e+02, threshold=5.023e+01, percent-clipped=2.0 2024-08-19 21:39:06,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4543290.0, ans=0.125 2024-08-19 21:39:12,108 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 9750, loss[loss=0.07666, beats_loss=0.009865, ecapa_loss=0.0001645, whisper_loss=0.06515, over 15111.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.0001396, whisper_loss=0.0906, over 3783018.08 frames. ], batch size: 62, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:39:24,698 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 27 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-19 21:40:12,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4543690.0, ans=0.0 2024-08-19 21:40:28,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4543790.0, ans=0.125 2024-08-19 21:40:48,540 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 9800, loss[loss=0.1093, beats_loss=0.009739, ecapa_loss=0.0001466, whisper_loss=0.09805, over 19525.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0105, ecapa_loss=0.0001405, whisper_loss=0.09008, over 3751605.67 frames. ], batch size: 79, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:40:48,772 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-19 21:40:49,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4543890.0, ans=0.125 2024-08-19 21:41:02,692 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 27 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-19 21:41:18,511 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 16 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 21:41:31,806 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 18 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-19 21:42:16,139 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.232e+01 2.541e+01 2.711e+01 1.410e+02, threshold=5.082e+01, percent-clipped=1.0 2024-08-19 21:42:26,629 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 9850, loss[loss=0.07073, beats_loss=0.01197, ecapa_loss=0.0001803, whisper_loss=0.05696, over 13692.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01047, ecapa_loss=0.0001403, whisper_loss=0.09024, over 3772983.80 frames. ], batch size: 58, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:42:29,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4544390.0, ans=0.125 2024-08-19 21:42:31,076 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.13 vs. limit=22.5 2024-08-19 21:42:35,782 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 23 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-19 21:42:47,970 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.44 vs. limit=15.0 2024-08-19 21:43:03,212 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-19 21:43:20,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4544690.0, ans=0.2 2024-08-19 21:43:35,852 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=12.0 2024-08-19 21:43:41,053 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2024-08-19 21:43:45,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4544790.0, ans=0.125 2024-08-19 21:43:52,353 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 25 from LS+wenet, 19 from Vox, 50 fro AS 2024-08-19 21:43:55,319 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-19 21:43:56,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4544890.0, ans=0.1 2024-08-19 21:43:57,261 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 9900, loss[loss=0.09924, beats_loss=0.01186, ecapa_loss=0.0001312, whisper_loss=0.08607, over 22625.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01041, ecapa_loss=0.000142, whisper_loss=0.09052, over 3770825.30 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:44:03,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4544890.0, ans=0.125 2024-08-19 21:44:17,427 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 21 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-19 21:44:56,450 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 21:45:17,917 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.712e+01 2.278e+01 2.511e+01 2.738e+01 3.827e+01, threshold=5.022e+01, percent-clipped=0.0 2024-08-19 21:45:25,781 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 9950, loss[loss=0.09532, beats_loss=0.008878, ecapa_loss=0.0001484, whisper_loss=0.08496, over 17540.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01037, ecapa_loss=0.0001414, whisper_loss=0.09066, over 3748010.77 frames. ], batch size: 67, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:45:28,181 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 18 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-19 21:45:34,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4545390.0, ans=0.0 2024-08-19 21:45:39,757 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.10 vs. limit=22.5 2024-08-19 21:45:46,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4545490.0, ans=0.1 2024-08-19 21:45:54,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4545490.0, ans=0.0 2024-08-19 21:46:10,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4545590.0, ans=0.125 2024-08-19 21:46:16,184 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.63 vs. limit=22.5 2024-08-19 21:46:16,302 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.35 vs. limit=15.0 2024-08-19 21:46:56,083 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 10000, loss[loss=0.1108, beats_loss=0.008682, ecapa_loss=0.0001329, whisper_loss=0.1008, over 15248.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0104, ecapa_loss=0.0001411, whisper_loss=0.0904, over 3788772.40 frames. ], batch size: 55, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:47:11,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4545890.0, ans=0.0 2024-08-19 21:47:16,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4545990.0, ans=0.125 2024-08-19 21:47:26,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4545990.0, ans=0.125 2024-08-19 21:47:39,790 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.11 vs. limit=15.0 2024-08-19 21:47:53,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4546190.0, ans=0.0 2024-08-19 21:48:04,930 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.95 vs. limit=22.5 2024-08-19 21:48:17,683 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.220e+01 2.388e+01 2.607e+01 4.207e+01, threshold=4.776e+01, percent-clipped=0.0 2024-08-19 21:48:26,806 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 10050, loss[loss=0.1018, beats_loss=0.009458, ecapa_loss=0.000142, whisper_loss=0.09097, over 15214.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01037, ecapa_loss=0.0001413, whisper_loss=0.09067, over 3802046.45 frames. ], batch size: 60, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:48:31,951 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.16 vs. limit=10.0 2024-08-19 21:48:45,289 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 18 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-19 21:49:07,600 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.95 vs. limit=15.0 2024-08-19 21:49:57,071 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 10100, loss[loss=0.09857, beats_loss=0.008926, ecapa_loss=0.0001704, whisper_loss=0.08794, over 12013.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.0001409, whisper_loss=0.09032, over 3847633.95 frames. ], batch size: 50, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:50:05,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4546890.0, ans=0.0 2024-08-19 21:51:05,504 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 20 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-19 21:51:12,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4547290.0, ans=0.125 2024-08-19 21:51:12,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4547290.0, ans=0.2 2024-08-19 21:51:13,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4547290.0, ans=0.0 2024-08-19 21:51:21,705 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.291e+01 2.540e+01 2.770e+01 3.592e+02, threshold=5.079e+01, percent-clipped=1.0 2024-08-19 21:51:30,717 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 10150, loss[loss=0.1004, beats_loss=0.01027, ecapa_loss=0.0001342, whisper_loss=0.08877, over 15363.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01046, ecapa_loss=0.0001408, whisper_loss=0.0897, over 3826654.75 frames. ], batch size: 57, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:51:33,074 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 24 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-19 21:51:42,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4547390.0, ans=0.0 2024-08-19 21:51:47,846 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 19 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-19 21:52:14,075 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 21:52:24,912 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.08 vs. limit=15.0 2024-08-19 21:53:04,953 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 10200, loss[loss=0.08364, beats_loss=0.01245, ecapa_loss=0.0001222, whisper_loss=0.06997, over 23244.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01051, ecapa_loss=0.0001397, whisper_loss=0.08923, over 3793653.90 frames. ], batch size: 95, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:53:05,139 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 32 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 21:53:13,542 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 21 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-19 21:53:32,369 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 37 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 21:53:33,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4547990.0, ans=0.125 2024-08-19 21:53:40,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4547990.0, ans=0.0 2024-08-19 21:53:48,101 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.12 vs. limit=15.0 2024-08-19 21:53:48,614 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-19 21:53:51,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4548090.0, ans=0.0 2024-08-19 21:53:54,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4548090.0, ans=0.125 2024-08-19 21:54:25,279 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.39 vs. limit=22.5 2024-08-19 21:54:33,829 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.328e+01 2.553e+01 2.832e+01 3.795e+01, threshold=5.106e+01, percent-clipped=0.0 2024-08-19 21:54:34,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4548290.0, ans=0.0 2024-08-19 21:54:38,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4548290.0, ans=0.0 2024-08-19 21:54:43,483 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 10250, loss[loss=0.08848, beats_loss=0.00811, ecapa_loss=0.0001433, whisper_loss=0.07894, over 12801.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01059, ecapa_loss=0.000139, whisper_loss=0.08864, over 3776104.59 frames. ], batch size: 50, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:55:01,551 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 21 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-19 21:55:10,035 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 25 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-19 21:55:19,895 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 17 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-19 21:55:34,611 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 28 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-19 21:55:43,627 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 14 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-19 21:55:47,219 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 32 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-19 21:56:21,618 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 10300, loss[loss=0.09738, beats_loss=0.01056, ecapa_loss=0.0001789, whisper_loss=0.08503, over 14439.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01054, ecapa_loss=0.0001406, whisper_loss=0.08916, over 3798329.73 frames. ], batch size: 64, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:56:35,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4548890.0, ans=0.125 2024-08-19 21:56:36,132 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.16 vs. limit=10.0 2024-08-19 21:56:43,817 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.09 vs. limit=10.0 2024-08-19 21:56:45,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4548990.0, ans=0.2 2024-08-19 21:57:04,026 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-19 21:57:11,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4549090.0, ans=0.125 2024-08-19 21:57:22,933 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=15.0 2024-08-19 21:57:26,590 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.432e+01 2024-08-19 21:57:31,570 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 22 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-19 21:57:50,422 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.783e+01 2.288e+01 2.512e+01 2.813e+01 4.612e+01, threshold=5.024e+01, percent-clipped=0.0 2024-08-19 21:57:59,995 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 10350, loss[loss=0.08695, beats_loss=0.01088, ecapa_loss=0.0001187, whisper_loss=0.07489, over 15276.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01048, ecapa_loss=0.0001411, whisper_loss=0.08894, over 3792515.84 frames. ], batch size: 56, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:58:01,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4549390.0, ans=0.1 2024-08-19 21:58:15,918 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 22 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-19 21:58:25,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4549490.0, ans=0.0 2024-08-19 21:58:27,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4549490.0, ans=0.125 2024-08-19 21:58:28,043 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 24 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-19 21:58:36,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4549490.0, ans=0.0 2024-08-19 21:58:46,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4549590.0, ans=0.1 2024-08-19 21:58:47,014 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.30 vs. limit=22.5 2024-08-19 21:59:02,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4549690.0, ans=0.125 2024-08-19 21:59:39,695 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 10400, loss[loss=0.0827, beats_loss=0.01413, ecapa_loss=0.0001362, whisper_loss=0.06721, over 20658.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01054, ecapa_loss=0.0001409, whisper_loss=0.08832, over 3808108.15 frames. ], batch size: 88, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:59:45,897 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 22 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-19 21:59:47,728 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 16 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-19 21:59:49,805 WARNING [optim.py:496] (3/4) Scaling gradients by 0.09229938685894012, model_norm_threshold=50.23906326293945 2024-08-19 21:59:49,971 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.352e+04, grad_sumsq=4.110e+06, orig_rms_sq=1.059e-02 2024-08-19 22:00:05,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4549990.0, ans=0.0 2024-08-19 22:00:15,245 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 26 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 22:00:18,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4549990.0, ans=0.125 2024-08-19 22:00:31,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4550090.0, ans=0.0 2024-08-19 22:00:40,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4550190.0, ans=0.1 2024-08-19 22:00:57,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4550190.0, ans=0.0 2024-08-19 22:01:14,096 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.360e+01 2.658e+01 2.930e+01 5.443e+02, threshold=5.316e+01, percent-clipped=2.0 2024-08-19 22:01:15,424 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2024-08-19 22:01:23,793 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 10450, loss[loss=0.09676, beats_loss=0.01133, ecapa_loss=0.0001378, whisper_loss=0.08405, over 17069.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01054, ecapa_loss=0.0001418, whisper_loss=0.0883, over 3799452.53 frames. ], batch size: 71, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:01:26,054 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 22:01:29,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=4550390.0, ans=6.0 2024-08-19 22:01:44,563 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 22:01:47,287 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 22:01:50,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4550490.0, ans=0.2 2024-08-19 22:02:03,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4550490.0, ans=0.0 2024-08-19 22:02:15,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4550590.0, ans=0.1 2024-08-19 22:02:16,357 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 24 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-19 22:02:22,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4550590.0, ans=0.0 2024-08-19 22:02:27,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4550690.0, ans=0.125 2024-08-19 22:03:09,023 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 10500, loss[loss=0.1043, beats_loss=0.0107, ecapa_loss=0.0001601, whisper_loss=0.09197, over 22330.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01044, ecapa_loss=0.0001429, whisper_loss=0.08893, over 3816165.44 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:03:14,314 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 27 from LS+wenet, 16 from Vox, 50 fro AS 2024-08-19 22:03:15,738 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=15.0 2024-08-19 22:03:15,913 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.94 vs. limit=15.0 2024-08-19 22:03:20,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4550890.0, ans=0.0 2024-08-19 22:03:20,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4550890.0, ans=0.95 2024-08-19 22:03:44,779 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.26 vs. limit=22.5 2024-08-19 22:04:05,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4551090.0, ans=0.0 2024-08-19 22:04:23,367 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 26 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-19 22:04:38,012 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 19 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-19 22:04:43,915 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.385e+01 2.732e+01 3.050e+01 1.536e+02, threshold=5.464e+01, percent-clipped=1.0 2024-08-19 22:04:55,220 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 10550, loss[loss=0.08129, beats_loss=0.009301, ecapa_loss=0.0001292, whisper_loss=0.07069, over 13470.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01048, ecapa_loss=0.0001426, whisper_loss=0.08894, over 3860245.77 frames. ], batch size: 51, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:05:09,184 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 18 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-19 22:05:20,598 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 26 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-19 22:05:26,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4551490.0, ans=0.125 2024-08-19 22:05:29,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4551490.0, ans=0.125 2024-08-19 22:05:37,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4551590.0, ans=0.07 2024-08-19 22:06:01,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4551690.0, ans=0.125 2024-08-19 22:06:16,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4551690.0, ans=0.07 2024-08-19 22:06:23,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4551790.0, ans=0.05 2024-08-19 22:06:33,772 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.48 vs. limit=15.0 2024-08-19 22:06:46,838 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 10600, loss[loss=0.09134, beats_loss=0.01062, ecapa_loss=0.0001519, whisper_loss=0.0792, over 17410.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01039, ecapa_loss=0.0001421, whisper_loss=0.08991, over 3836278.89 frames. ], batch size: 71, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:06:55,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4551890.0, ans=0.2 2024-08-19 22:07:04,912 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 29 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-19 22:07:47,250 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 22:07:53,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4552190.0, ans=0.125 2024-08-19 22:07:53,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4552190.0, ans=0.0 2024-08-19 22:08:00,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4552190.0, ans=0.2 2024-08-19 22:08:00,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4552190.0, ans=0.125 2024-08-19 22:08:21,880 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.619e+01 2.245e+01 2.495e+01 2.796e+01 7.063e+01, threshold=4.991e+01, percent-clipped=1.0 2024-08-19 22:08:32,210 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 10650, loss[loss=0.1103, beats_loss=0.01041, ecapa_loss=0.000139, whisper_loss=0.09847, over 22169.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01038, ecapa_loss=0.0001418, whisper_loss=0.08991, over 3814502.06 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:08:38,251 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 26 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-19 22:08:43,986 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.02 vs. limit=15.0 2024-08-19 22:09:15,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4552590.0, ans=0.0 2024-08-19 22:09:25,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4552590.0, ans=0.0 2024-08-19 22:09:49,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4552690.0, ans=0.125 2024-08-19 22:10:03,599 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.364e+00 2024-08-19 22:10:15,048 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 10700, loss[loss=0.08911, beats_loss=0.0113, ecapa_loss=0.0001311, whisper_loss=0.07651, over 14325.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01038, ecapa_loss=0.0001411, whisper_loss=0.08956, over 3783226.74 frames. ], batch size: 57, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:10:16,690 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 22 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 22:10:35,495 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 16 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 22:10:38,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4552990.0, ans=0.0 2024-08-19 22:10:38,676 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-08-19 22:10:40,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4552990.0, ans=0.125 2024-08-19 22:10:41,936 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 29 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-19 22:10:49,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4552990.0, ans=0.0 2024-08-19 22:11:13,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4553090.0, ans=0.125 2024-08-19 22:11:15,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4553090.0, ans=0.2 2024-08-19 22:11:36,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4553190.0, ans=0.5 2024-08-19 22:11:41,113 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 27 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-19 22:11:53,518 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.62 vs. limit=22.5 2024-08-19 22:11:54,024 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.715e+01 2.310e+01 2.569e+01 2.819e+01 4.934e+01, threshold=5.139e+01, percent-clipped=0.0 2024-08-19 22:11:54,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4553290.0, ans=0.09899494936611666 2024-08-19 22:12:02,519 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 22 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-19 22:12:03,575 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 10750, loss[loss=0.1082, beats_loss=0.009833, ecapa_loss=0.0001684, whisper_loss=0.09669, over 14922.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01051, ecapa_loss=0.0001408, whisper_loss=0.08931, over 3811216.42 frames. ], batch size: 57, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:12:34,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4553490.0, ans=0.09899494936611666 2024-08-19 22:12:38,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4553490.0, ans=0.0 2024-08-19 22:12:53,629 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 25 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 22:13:04,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4553690.0, ans=0.0 2024-08-19 22:13:27,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4553790.0, ans=0.0 2024-08-19 22:13:27,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.59 vs. limit=22.5 2024-08-19 22:13:33,266 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-19 22:13:44,357 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 10800, loss[loss=0.1095, beats_loss=0.01034, ecapa_loss=0.0001475, whisper_loss=0.09773, over 22833.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01042, ecapa_loss=0.0001401, whisper_loss=0.09026, over 3822755.35 frames. ], batch size: 92, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:13:54,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4553890.0, ans=0.125 2024-08-19 22:14:35,385 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 38 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-19 22:15:09,053 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 25 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-19 22:15:16,915 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.322e+01 2.506e+01 2.895e+01 8.208e+01, threshold=5.011e+01, percent-clipped=1.0 2024-08-19 22:15:26,482 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 10850, loss[loss=0.1014, beats_loss=0.01088, ecapa_loss=0.00013, whisper_loss=0.08922, over 22860.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01043, ecapa_loss=0.0001399, whisper_loss=0.08948, over 3827318.28 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:15:28,741 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 27 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-19 22:15:33,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4554390.0, ans=0.125 2024-08-19 22:16:14,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4554590.0, ans=0.125 2024-08-19 22:16:16,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4554590.0, ans=0.125 2024-08-19 22:16:20,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4554590.0, ans=0.1 2024-08-19 22:16:34,412 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 15 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-19 22:17:09,255 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 25 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-19 22:17:11,279 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 10900, loss[loss=0.1141, beats_loss=0.008413, ecapa_loss=0.0001544, whisper_loss=0.1041, over 17954.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0104, ecapa_loss=0.0001403, whisper_loss=0.08998, over 3821070.38 frames. ], batch size: 68, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:17:14,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4554890.0, ans=0.125 2024-08-19 22:17:14,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4554890.0, ans=0.125 2024-08-19 22:18:04,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4555090.0, ans=0.95 2024-08-19 22:18:10,766 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.81 vs. limit=15.0 2024-08-19 22:18:34,473 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2024-08-19 22:18:51,351 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.334e+01 2.577e+01 2.862e+01 5.392e+01, threshold=5.154e+01, percent-clipped=2.0 2024-08-19 22:18:51,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4555290.0, ans=0.125 2024-08-19 22:19:03,222 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 10950, loss[loss=0.0847, beats_loss=0.01028, ecapa_loss=0.0001781, whisper_loss=0.07264, over 17694.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01041, ecapa_loss=0.0001415, whisper_loss=0.08997, over 3814506.31 frames. ], batch size: 78, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:19:07,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4555390.0, ans=0.125 2024-08-19 22:19:10,100 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.82 vs. limit=22.5 2024-08-19 22:19:33,354 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 24 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-19 22:19:52,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4555590.0, ans=0.125 2024-08-19 22:20:18,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4555690.0, ans=0.125 2024-08-19 22:20:37,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4555790.0, ans=0.125 2024-08-19 22:20:39,988 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-19 22:20:57,498 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 11000, loss[loss=0.09953, beats_loss=0.01265, ecapa_loss=0.0001329, whisper_loss=0.08555, over 20469.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01045, ecapa_loss=0.0001408, whisper_loss=0.08957, over 3815076.44 frames. ], batch size: 83, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:21:07,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4555890.0, ans=0.09899494936611666 2024-08-19 22:21:23,582 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.19 vs. limit=15.0 2024-08-19 22:22:15,901 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=15.0 2024-08-19 22:22:40,252 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.290e+01 2.533e+01 2.932e+01 4.213e+01, threshold=5.066e+01, percent-clipped=0.0 2024-08-19 22:22:51,780 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 11050, loss[loss=0.08987, beats_loss=0.009142, ecapa_loss=0.0001507, whisper_loss=0.07922, over 14903.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0104, ecapa_loss=0.0001421, whisper_loss=0.09021, over 3851820.98 frames. ], batch size: 58, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:23:00,072 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.20 vs. limit=15.0 2024-08-19 22:23:09,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4556390.0, ans=0.0 2024-08-19 22:23:46,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=15.0 2024-08-19 22:24:29,209 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.35 vs. limit=22.5 2024-08-19 22:24:45,939 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 11100, loss[loss=0.09219, beats_loss=0.008792, ecapa_loss=0.000144, whisper_loss=0.08196, over 17869.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01043, ecapa_loss=0.0001411, whisper_loss=0.09036, over 3870461.65 frames. ], batch size: 68, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:24:49,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4556890.0, ans=0.09899494936611666 2024-08-19 22:24:50,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4556890.0, ans=0.125 2024-08-19 22:24:53,218 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 25 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-19 22:24:54,311 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.60 vs. limit=22.5 2024-08-19 22:24:55,367 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 18 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-19 22:25:14,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4556990.0, ans=0.0 2024-08-19 22:25:35,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4557090.0, ans=0.125 2024-08-19 22:25:42,513 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 17 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-19 22:26:18,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=4557290.0, ans=0.1 2024-08-19 22:26:19,856 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 23 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-19 22:26:30,299 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.214e+01 2.398e+01 2.634e+01 3.893e+01, threshold=4.795e+01, percent-clipped=0.0 2024-08-19 22:26:31,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4557290.0, ans=0.1 2024-08-19 22:26:41,243 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 11150, loss[loss=0.1087, beats_loss=0.01212, ecapa_loss=0.0001218, whisper_loss=0.09532, over 14691.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01043, ecapa_loss=0.0001415, whisper_loss=0.08992, over 3846673.19 frames. ], batch size: 57, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:26:53,858 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 22:28:39,602 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 11200, loss[loss=0.137, beats_loss=0.007695, ecapa_loss=0.000164, whisper_loss=0.1277, over 22489.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0104, ecapa_loss=0.0001415, whisper_loss=0.09036, over 3877728.79 frames. ], batch size: 90, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:29:19,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4557990.0, ans=0.125 2024-08-19 22:29:46,490 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 33 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 22:29:58,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4558190.0, ans=0.0 2024-08-19 22:30:19,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4558290.0, ans=0.2 2024-08-19 22:30:28,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4558290.0, ans=0.125 2024-08-19 22:30:34,223 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.555e+01 2.356e+01 2.585e+01 2.931e+01 9.399e+01, threshold=5.169e+01, percent-clipped=1.0 2024-08-19 22:30:39,132 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.92 vs. limit=22.5 2024-08-19 22:30:47,229 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 11250, loss[loss=0.09704, beats_loss=0.01289, ecapa_loss=0.0001001, whisper_loss=0.08315, over 23065.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01043, ecapa_loss=0.0001418, whisper_loss=0.09033, over 3881037.67 frames. ], batch size: 90, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:32:00,891 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.15 vs. limit=12.0 2024-08-19 22:32:21,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4558790.0, ans=0.04949747468305833 2024-08-19 22:32:46,941 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 11300, loss[loss=0.1165, beats_loss=0.007034, ecapa_loss=0.0001795, whisper_loss=0.1077, over 20327.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.0001403, whisper_loss=0.09031, over 3869580.06 frames. ], batch size: 82, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:33:22,074 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.60 vs. limit=10.0 2024-08-19 22:33:30,183 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 19 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-19 22:33:43,651 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 25 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 22:33:48,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4559090.0, ans=0.125 2024-08-19 22:34:07,827 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 25 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-19 22:34:10,304 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 17 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 22:34:23,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4559290.0, ans=0.0 2024-08-19 22:34:36,630 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.198e+01 2.361e+01 2.678e+01 4.685e+01, threshold=4.723e+01, percent-clipped=0.0 2024-08-19 22:34:36,874 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-19 22:34:39,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4559290.0, ans=0.0 2024-08-19 22:34:43,174 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 20 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-19 22:34:48,378 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 11350, loss[loss=0.09614, beats_loss=0.009593, ecapa_loss=0.0001178, whisper_loss=0.08536, over 17507.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01043, ecapa_loss=0.0001403, whisper_loss=0.08996, over 3867384.89 frames. ], batch size: 65, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:35:00,503 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.70 vs. limit=15.0 2024-08-19 22:35:05,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4559390.0, ans=0.125 2024-08-19 22:35:18,456 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 22 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-19 22:36:25,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4559690.0, ans=0.125 2024-08-19 22:36:36,289 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 9 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-19 22:36:46,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4559790.0, ans=0.0 2024-08-19 22:36:47,378 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-19 22:36:52,065 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 11400, loss[loss=0.08392, beats_loss=0.01281, ecapa_loss=0.0001315, whisper_loss=0.06979, over 15042.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01039, ecapa_loss=0.0001402, whisper_loss=0.08989, over 3837724.66 frames. ], batch size: 63, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:36:59,828 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 17 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-19 22:37:04,091 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2024-08-19 22:37:13,638 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 22:37:17,159 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-19 22:38:43,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4560290.0, ans=0.0 2024-08-19 22:38:44,545 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.308e+01 2.574e+01 2.916e+01 2.255e+02, threshold=5.148e+01, percent-clipped=1.0 2024-08-19 22:38:44,754 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 23 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-19 22:38:47,414 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 22 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-19 22:38:50,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4560290.0, ans=0.0 2024-08-19 22:38:56,448 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 11450, loss[loss=0.08961, beats_loss=0.01102, ecapa_loss=0.0001602, whisper_loss=0.07698, over 22535.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001389, whisper_loss=0.08959, over 3829011.06 frames. ], batch size: 93, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:39:30,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4560490.0, ans=10.0 2024-08-19 22:39:35,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4560490.0, ans=0.125 2024-08-19 22:39:44,202 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=12.0 2024-08-19 22:39:51,608 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.54 vs. limit=10.0 2024-08-19 22:40:36,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4560790.0, ans=0.125 2024-08-19 22:40:45,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4560790.0, ans=0.1 2024-08-19 22:40:53,906 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 21 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 22:40:55,974 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 11500, loss[loss=0.1115, beats_loss=0.009082, ecapa_loss=0.0001165, whisper_loss=0.1013, over 16218.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01041, ecapa_loss=0.0001399, whisper_loss=0.09057, over 3828758.64 frames. ], batch size: 59, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:40:59,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4560890.0, ans=0.125 2024-08-19 22:40:59,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4560890.0, ans=10.0 2024-08-19 22:41:07,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4560890.0, ans=0.125 2024-08-19 22:41:08,518 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 29 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-19 22:41:20,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4560990.0, ans=0.125 2024-08-19 22:41:27,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4560990.0, ans=0.0 2024-08-19 22:41:30,441 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 19 from LS+wenet, 22 from Vox, 51 fro AS 2024-08-19 22:41:37,480 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 19 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-19 22:42:39,957 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.243e+01 2.481e+01 2.784e+01 2.057e+02, threshold=4.962e+01, percent-clipped=3.0 2024-08-19 22:42:48,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4561290.0, ans=0.1 2024-08-19 22:42:51,364 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 11550, loss[loss=0.1056, beats_loss=0.009444, ecapa_loss=0.0001305, whisper_loss=0.09487, over 16314.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.0001395, whisper_loss=0.09039, over 3834523.48 frames. ], batch size: 62, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:42:57,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4561390.0, ans=0.125 2024-08-19 22:43:14,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4561490.0, ans=0.04949747468305833 2024-08-19 22:43:55,854 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 22 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-19 22:44:33,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4561790.0, ans=0.0 2024-08-19 22:44:42,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4561890.0, ans=0.125 2024-08-19 22:44:43,868 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 11600, loss[loss=0.1259, beats_loss=0.01105, ecapa_loss=0.0001087, whisper_loss=0.1137, over 20147.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01037, ecapa_loss=0.0001388, whisper_loss=0.09089, over 3840367.65 frames. ], batch size: 75, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:44:47,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4561890.0, ans=0.1 2024-08-19 22:45:02,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4561990.0, ans=0.0 2024-08-19 22:45:23,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4562090.0, ans=0.125 2024-08-19 22:45:55,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4562190.0, ans=0.0 2024-08-19 22:46:07,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=4562290.0, ans=0.1 2024-08-19 22:46:09,826 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.05 vs. limit=12.0 2024-08-19 22:46:10,268 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.615e+01 2.277e+01 2.484e+01 2.833e+01 4.332e+01, threshold=4.968e+01, percent-clipped=0.0 2024-08-19 22:46:19,764 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 11650, loss[loss=0.09206, beats_loss=0.01062, ecapa_loss=0.0001547, whisper_loss=0.07989, over 17999.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01032, ecapa_loss=0.0001407, whisper_loss=0.09036, over 3854626.24 frames. ], batch size: 75, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:46:24,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4562390.0, ans=0.125 2024-08-19 22:46:30,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4562390.0, ans=0.0 2024-08-19 22:46:45,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4562490.0, ans=0.125 2024-08-19 22:46:47,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4562490.0, ans=0.125 2024-08-19 22:47:02,813 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.36 vs. limit=22.5 2024-08-19 22:47:15,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4562590.0, ans=0.1 2024-08-19 22:47:17,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4562590.0, ans=0.125 2024-08-19 22:47:18,759 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 20 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-19 22:47:35,862 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 21 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-19 22:47:39,756 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.17 vs. limit=22.5 2024-08-19 22:47:46,230 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 24 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-19 22:47:50,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4562790.0, ans=0.125 2024-08-19 22:47:58,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4562790.0, ans=0.2 2024-08-19 22:47:59,953 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 22 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-19 22:48:01,571 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 11700, loss[loss=0.1151, beats_loss=0.007524, ecapa_loss=0.0001339, whisper_loss=0.1063, over 14498.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01025, ecapa_loss=0.0001414, whisper_loss=0.09078, over 3840469.01 frames. ], batch size: 55, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:48:01,854 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 22 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-19 22:48:15,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4562890.0, ans=0.125 2024-08-19 22:48:19,183 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 22:48:28,778 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 16 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-19 22:48:32,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=4562990.0, ans=12.0 2024-08-19 22:48:42,055 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 16 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-19 22:48:45,324 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-19 22:48:56,722 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.526e+01 2024-08-19 22:49:17,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4563190.0, ans=0.1 2024-08-19 22:49:34,909 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.355e+01 2.582e+01 2.983e+01 7.922e+01, threshold=5.164e+01, percent-clipped=1.0 2024-08-19 22:49:42,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4563290.0, ans=0.2 2024-08-19 22:49:43,124 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 22:49:47,375 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 11750, loss[loss=0.08171, beats_loss=0.009173, ecapa_loss=0.0001731, whisper_loss=0.07081, over 15373.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01028, ecapa_loss=0.0001407, whisper_loss=0.08994, over 3843889.20 frames. ], batch size: 62, lr: 1.96e-03, grad_scale: 1.152921504606847e+18 2024-08-19 22:49:52,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4563390.0, ans=0.0 2024-08-19 22:50:12,748 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 32 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 22:50:52,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4563590.0, ans=0.125 2024-08-19 22:51:34,990 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.97 vs. limit=10.0 2024-08-19 22:51:42,169 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 11800, loss[loss=0.09692, beats_loss=0.01154, ecapa_loss=0.0001286, whisper_loss=0.08409, over 20532.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01029, ecapa_loss=0.0001409, whisper_loss=0.09045, over 3837079.67 frames. ], batch size: 80, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:51:44,738 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 24 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 22:52:07,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4563990.0, ans=0.09899494936611666 2024-08-19 22:52:09,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4563990.0, ans=0.0 2024-08-19 22:52:13,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4563990.0, ans=0.125 2024-08-19 22:52:34,237 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-19 22:53:24,440 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.230e+01 2.402e+01 2.738e+01 3.572e+01, threshold=4.803e+01, percent-clipped=0.0 2024-08-19 22:53:33,758 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 11850, loss[loss=0.1099, beats_loss=0.007811, ecapa_loss=0.0001745, whisper_loss=0.1003, over 21799.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01038, ecapa_loss=0.0001407, whisper_loss=0.08936, over 3815386.92 frames. ], batch size: 89, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:54:05,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4564490.0, ans=0.0 2024-08-19 22:54:42,284 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.11 vs. limit=15.0 2024-08-19 22:54:55,010 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 17 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-19 22:55:00,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4564690.0, ans=0.1 2024-08-19 22:55:10,350 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 33 from LS+wenet, 13 from Vox, 46 fro AS 2024-08-19 22:55:25,500 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 11900, loss[loss=0.1072, beats_loss=0.009828, ecapa_loss=0.0001633, whisper_loss=0.09574, over 21762.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01043, ecapa_loss=0.0001402, whisper_loss=0.09027, over 3833096.50 frames. ], batch size: 88, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:56:14,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4565090.0, ans=0.125 2024-08-19 22:56:23,590 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 12 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-19 22:56:30,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4565090.0, ans=0.125 2024-08-19 22:56:38,203 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 22:56:55,570 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.28 vs. limit=12.0 2024-08-19 22:57:00,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4565290.0, ans=0.125 2024-08-19 22:57:06,953 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.651e+01 2.317e+01 2.609e+01 2.861e+01 6.356e+01, threshold=5.219e+01, percent-clipped=1.0 2024-08-19 22:57:15,627 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 11950, loss[loss=0.09719, beats_loss=0.01108, ecapa_loss=0.0001358, whisper_loss=0.08475, over 16124.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.0001412, whisper_loss=0.09063, over 3815545.22 frames. ], batch size: 64, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:57:15,865 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 22 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-19 22:58:11,365 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=26.06 vs. limit=22.5 2024-08-19 22:58:23,255 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 14 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-19 22:58:24,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4565690.0, ans=0.125 2024-08-19 22:58:43,928 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 38 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 22:58:51,538 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=16.69 vs. limit=15.0 2024-08-19 22:58:58,908 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 12000, loss[loss=0.09721, beats_loss=0.01115, ecapa_loss=0.0001389, whisper_loss=0.08468, over 20724.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01051, ecapa_loss=0.0001407, whisper_loss=0.09109, over 3850688.25 frames. ], batch size: 87, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:58:58,909 INFO [train_multi_KD3.py:1140] (3/4) Computing validation loss 2024-08-19 22:59:20,456 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.0500, 2.7609, 2.3327, 1.6030], device='cuda:3') 2024-08-19 22:59:35,973 INFO [train_multi_KD3.py:1150] (3/4) Epoch 31, validation on ASR_libri: loss=0.2535, beats_loss=0, ecapa_loss=0.0005134, whisper_loss=0.2483, over 931116.00 frames. 2024-08-19 23:00:01,564 INFO [train_multi_KD3.py:1150] (3/4) Epoch 31, validation on SV_voxceleb1: loss=0.003987, beats_loss=0, ecapa_loss=0.0003987, whisper_loss=0, over 944235.00 frames. 2024-08-19 23:01:39,566 INFO [train_multi_KD3.py:1150] (3/4) Epoch 31, validation on AT_audioset: loss=0.02302, beats_loss=0.02302, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 23:01:39,570 INFO [train_multi_KD3.py:1156] (3/4) Maximum memory allocated so far is 32606MB 2024-08-19 23:01:48,143 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.25 vs. limit=15.0 2024-08-19 23:02:07,405 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 19 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 23:02:14,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4565990.0, ans=0.125 2024-08-19 23:02:27,861 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 25 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-19 23:02:30,621 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 20 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 23:02:35,884 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 23:02:39,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4566090.0, ans=0.125 2024-08-19 23:02:49,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4566190.0, ans=0.0 2024-08-19 23:02:53,716 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 22 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-19 23:02:57,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4566190.0, ans=0.0 2024-08-19 23:02:59,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4566190.0, ans=0.125 2024-08-19 23:03:12,779 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 30 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-19 23:03:21,839 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.392e+01 2.640e+01 2.904e+01 4.083e+01, threshold=5.280e+01, percent-clipped=0.0 2024-08-19 23:03:22,120 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 19 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 23:03:31,867 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 12050, loss[loss=0.1222, beats_loss=0.01003, ecapa_loss=0.0001354, whisper_loss=0.1108, over 22669.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01054, ecapa_loss=0.0001413, whisper_loss=0.09025, over 3850269.40 frames. ], batch size: 89, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:03:35,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4566390.0, ans=0.125 2024-08-19 23:03:47,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4566390.0, ans=0.1 2024-08-19 23:03:55,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4566490.0, ans=0.125 2024-08-19 23:04:15,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4566490.0, ans=0.0 2024-08-19 23:04:24,782 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 23:04:53,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4566690.0, ans=0.125 2024-08-19 23:05:11,024 WARNING [optim.py:496] (3/4) Scaling gradients by 0.04405633732676506, model_norm_threshold=52.79664611816406 2024-08-19 23:05:11,189 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.250e+05, grad_sumsq=2.117e+07, orig_rms_sq=1.063e-02 2024-08-19 23:05:19,537 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 12100, loss[loss=0.1076, beats_loss=0.01082, ecapa_loss=0.000167, whisper_loss=0.09513, over 21802.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01057, ecapa_loss=0.0001421, whisper_loss=0.08992, over 3841627.03 frames. ], batch size: 90, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:05:20,038 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 18 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-19 23:05:29,639 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 32 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-19 23:06:00,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4567090.0, ans=0.2 2024-08-19 23:06:25,742 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.10 vs. limit=22.5 2024-08-19 23:06:37,598 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 23:06:38,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4567190.0, ans=0.0 2024-08-19 23:06:56,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4567290.0, ans=0.0 2024-08-19 23:06:57,551 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.332e+01 2.618e+01 3.087e+01 1.198e+03, threshold=5.236e+01, percent-clipped=2.0 2024-08-19 23:07:01,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4567290.0, ans=0.125 2024-08-19 23:07:05,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4567390.0, ans=0.1 2024-08-19 23:07:06,259 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 12150, loss[loss=0.09325, beats_loss=0.01469, ecapa_loss=0.0001159, whisper_loss=0.0774, over 22668.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01052, ecapa_loss=0.0001438, whisper_loss=0.09016, over 3870095.13 frames. ], batch size: 90, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:07:08,396 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 25 from LS+wenet, 15 from Vox, 50 fro AS 2024-08-19 23:07:19,426 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 30 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-19 23:07:40,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4567490.0, ans=0.125 2024-08-19 23:07:42,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4567490.0, ans=0.09899494936611666 2024-08-19 23:07:52,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4567590.0, ans=0.125 2024-08-19 23:07:57,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4567590.0, ans=0.125 2024-08-19 23:08:04,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4567590.0, ans=0.125 2024-08-19 23:08:21,808 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 21 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-19 23:08:22,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4567690.0, ans=0.0 2024-08-19 23:08:28,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4567790.0, ans=0.0 2024-08-19 23:08:49,861 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 12200, loss[loss=0.09899, beats_loss=0.009672, ecapa_loss=0.0001506, whisper_loss=0.08781, over 17325.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01048, ecapa_loss=0.0001436, whisper_loss=0.08984, over 3853501.49 frames. ], batch size: 69, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:08:50,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4567890.0, ans=0.2 2024-08-19 23:08:51,144 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.59 vs. limit=12.0 2024-08-19 23:08:56,637 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 23 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-19 23:09:08,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4567990.0, ans=0.125 2024-08-19 23:09:18,750 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 27 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 23:09:22,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4567990.0, ans=0.125 2024-08-19 23:09:22,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4567990.0, ans=0.07 2024-08-19 23:09:35,571 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 29 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-19 23:09:45,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4568090.0, ans=0.0 2024-08-19 23:10:03,497 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 37 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 23:10:18,390 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.48 vs. limit=22.5 2024-08-19 23:10:18,679 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.296e+01 2.498e+01 2.784e+01 3.860e+01, threshold=4.997e+01, percent-clipped=0.0 2024-08-19 23:10:26,700 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 12250, loss[loss=0.1059, beats_loss=0.01094, ecapa_loss=0.0001066, whisper_loss=0.09386, over 22453.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01049, ecapa_loss=0.0001429, whisper_loss=0.08984, over 3833150.73 frames. ], batch size: 88, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:10:38,389 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.41 vs. limit=22.5 2024-08-19 23:10:56,973 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.06 vs. limit=10.0 2024-08-19 23:10:57,729 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 26 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-19 23:11:19,872 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 23 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 23:11:34,616 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 27 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-19 23:11:46,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4568790.0, ans=0.1 2024-08-19 23:11:59,084 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 19 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-19 23:12:03,949 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 12300, loss[loss=0.08464, beats_loss=0.01215, ecapa_loss=0.0001256, whisper_loss=0.07123, over 22366.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01051, ecapa_loss=0.0001419, whisper_loss=0.0893, over 3814203.73 frames. ], batch size: 91, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:12:15,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4568890.0, ans=0.125 2024-08-19 23:12:25,640 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 15 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-19 23:12:39,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4568990.0, ans=0.0 2024-08-19 23:12:46,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4569090.0, ans=0.2 2024-08-19 23:12:55,826 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.49 vs. limit=22.5 2024-08-19 23:13:03,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4569190.0, ans=0.0 2024-08-19 23:13:25,237 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2024-08-19 23:13:34,114 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.176e+01 2.435e+01 2.712e+01 4.279e+01, threshold=4.869e+01, percent-clipped=0.0 2024-08-19 23:13:34,374 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 23 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-19 23:13:40,638 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 23:13:42,440 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 12350, loss[loss=0.1133, beats_loss=0.009899, ecapa_loss=0.000158, whisper_loss=0.1018, over 22125.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01049, ecapa_loss=0.000141, whisper_loss=0.08955, over 3849728.24 frames. ], batch size: 91, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:13:47,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4569390.0, ans=0.125 2024-08-19 23:13:51,959 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 16 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-19 23:14:04,818 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 17 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-19 23:14:09,583 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 22 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-19 23:14:12,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4569490.0, ans=0.1 2024-08-19 23:14:19,778 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 21 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-19 23:14:21,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4569590.0, ans=0.125 2024-08-19 23:14:21,752 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.33 vs. limit=15.0 2024-08-19 23:14:26,854 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 17 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-19 23:14:36,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4569590.0, ans=0.1 2024-08-19 23:14:37,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4569590.0, ans=0.125 2024-08-19 23:14:44,286 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 24 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-19 23:14:47,255 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 26 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-19 23:15:08,861 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 33 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-19 23:15:24,712 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 12400, loss[loss=0.08477, beats_loss=0.01019, ecapa_loss=0.0001412, whisper_loss=0.07317, over 19225.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01052, ecapa_loss=0.000141, whisper_loss=0.0895, over 3859171.37 frames. ], batch size: 80, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:15:29,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4569890.0, ans=0.125 2024-08-19 23:15:37,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4569890.0, ans=0.2 2024-08-19 23:15:39,801 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 16 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 23:16:00,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4569990.0, ans=0.125 2024-08-19 23:16:58,481 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.07 vs. limit=15.0 2024-08-19 23:17:00,847 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.374e+01 2.637e+01 2.909e+01 4.258e+01, threshold=5.273e+01, percent-clipped=0.0 2024-08-19 23:17:09,506 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 12450, loss[loss=0.08661, beats_loss=0.01468, ecapa_loss=0.0001177, whisper_loss=0.07074, over 19764.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01058, ecapa_loss=0.0001401, whisper_loss=0.08921, over 3810588.05 frames. ], batch size: 82, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:17:51,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4570590.0, ans=0.2 2024-08-19 23:17:57,916 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 23:18:09,352 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 28 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-19 23:18:30,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4570790.0, ans=0.1 2024-08-19 23:18:51,838 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 12500, loss[loss=0.1014, beats_loss=0.01093, ecapa_loss=0.000116, whisper_loss=0.08928, over 18780.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01055, ecapa_loss=0.0001396, whisper_loss=0.0895, over 3794665.38 frames. ], batch size: 72, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:19:04,234 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 18 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-19 23:19:08,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4570890.0, ans=0.0 2024-08-19 23:19:24,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4570990.0, ans=0.2 2024-08-19 23:19:42,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4571090.0, ans=0.125 2024-08-19 23:19:46,561 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 22 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 23:19:52,471 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.54 vs. limit=22.5 2024-08-19 23:20:03,018 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 23:20:37,674 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.234e+01 2.500e+01 2.814e+01 4.349e+01, threshold=5.000e+01, percent-clipped=0.0 2024-08-19 23:20:46,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4571390.0, ans=0.125 2024-08-19 23:20:47,943 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 12550, loss[loss=0.08756, beats_loss=0.01124, ecapa_loss=0.0001206, whisper_loss=0.07512, over 20062.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01058, ecapa_loss=0.0001389, whisper_loss=0.08944, over 3765748.61 frames. ], batch size: 80, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:20:48,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4571390.0, ans=0.125 2024-08-19 23:20:50,525 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 19 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-19 23:21:08,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4571490.0, ans=0.125 2024-08-19 23:21:14,479 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 20 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-19 23:21:15,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4571490.0, ans=0.125 2024-08-19 23:21:54,460 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 26 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-19 23:22:01,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4571690.0, ans=0.1 2024-08-19 23:22:37,394 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 12600, loss[loss=0.109, beats_loss=0.01169, ecapa_loss=0.00014, whisper_loss=0.09591, over 19428.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01064, ecapa_loss=0.0001394, whisper_loss=0.08902, over 3756131.92 frames. ], batch size: 79, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:22:38,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4571890.0, ans=0.125 2024-08-19 23:22:39,327 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.03 vs. limit=15.0 2024-08-19 23:22:44,764 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.06 vs. limit=15.0 2024-08-19 23:23:02,153 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 23 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-19 23:23:12,107 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 25 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-19 23:23:12,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4571990.0, ans=0.0 2024-08-19 23:23:22,674 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 28 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 23:24:31,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4572290.0, ans=0.125 2024-08-19 23:24:32,017 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.292e+01 2.504e+01 2.662e+01 4.267e+01, threshold=5.008e+01, percent-clipped=0.0 2024-08-19 23:24:43,483 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 12650, loss[loss=0.09531, beats_loss=0.01132, ecapa_loss=0.0001622, whisper_loss=0.08237, over 21794.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01054, ecapa_loss=0.0001385, whisper_loss=0.0895, over 3776081.96 frames. ], batch size: 90, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:25:10,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4572490.0, ans=0.0 2024-08-19 23:25:12,787 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.66 vs. limit=15.0 2024-08-19 23:25:14,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4572490.0, ans=0.125 2024-08-19 23:25:28,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4572490.0, ans=0.125 2024-08-19 23:25:48,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4572590.0, ans=0.125 2024-08-19 23:25:53,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4572590.0, ans=0.125 2024-08-19 23:26:20,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4572790.0, ans=0.1 2024-08-19 23:26:34,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4572790.0, ans=0.125 2024-08-19 23:26:39,137 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 12700, loss[loss=0.1127, beats_loss=0.01044, ecapa_loss=0.0001282, whisper_loss=0.1009, over 19126.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01054, ecapa_loss=0.0001396, whisper_loss=0.08941, over 3812434.67 frames. ], batch size: 75, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:26:51,445 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 23:26:59,564 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 26 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-19 23:27:03,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4572990.0, ans=0.125 2024-08-19 23:27:23,852 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 30 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-19 23:27:25,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4573090.0, ans=0.2 2024-08-19 23:27:41,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4573090.0, ans=0.0 2024-08-19 23:28:13,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4573290.0, ans=0.125 2024-08-19 23:28:19,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4573290.0, ans=0.0 2024-08-19 23:28:25,361 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.359e+01 2.539e+01 2.793e+01 4.602e+02, threshold=5.078e+01, percent-clipped=1.0 2024-08-19 23:28:34,453 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 12750, loss[loss=0.1063, beats_loss=0.009559, ecapa_loss=0.000159, whisper_loss=0.09515, over 21996.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01053, ecapa_loss=0.0001394, whisper_loss=0.08931, over 3808080.69 frames. ], batch size: 90, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:29:09,723 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 26 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 23:29:43,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4573690.0, ans=0.1 2024-08-19 23:30:04,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4573690.0, ans=0.1 2024-08-19 23:30:09,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4573790.0, ans=0.125 2024-08-19 23:30:23,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4573790.0, ans=0.125 2024-08-19 23:30:34,024 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 12800, loss[loss=0.0975, beats_loss=0.006972, ecapa_loss=0.0001721, whisper_loss=0.08881, over 12875.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01045, ecapa_loss=0.0001401, whisper_loss=0.08972, over 3845949.47 frames. ], batch size: 52, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:31:46,783 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 23:31:52,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4574190.0, ans=0.125 2024-08-19 23:32:05,554 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 23:32:13,321 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 29 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-19 23:32:17,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4574290.0, ans=0.125 2024-08-19 23:32:27,805 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.310e+01 2.474e+01 2.739e+01 4.049e+01, threshold=4.949e+01, percent-clipped=0.0 2024-08-19 23:32:33,236 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 23:32:38,884 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 12850, loss[loss=0.0894, beats_loss=0.01474, ecapa_loss=0.0001385, whisper_loss=0.07328, over 16131.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01041, ecapa_loss=0.0001407, whisper_loss=0.09025, over 3833054.21 frames. ], batch size: 67, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:32:58,635 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 22 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 23:33:04,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4574490.0, ans=0.0 2024-08-19 23:33:13,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4574490.0, ans=0.015 2024-08-19 23:33:40,776 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.72 vs. limit=15.0 2024-08-19 23:33:47,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4574590.0, ans=0.0 2024-08-19 23:33:54,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4574690.0, ans=0.05 2024-08-19 23:34:01,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4574690.0, ans=0.07 2024-08-19 23:34:02,934 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 26 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-19 23:34:07,046 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.45 vs. limit=15.0 2024-08-19 23:34:09,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4574690.0, ans=0.125 2024-08-19 23:34:21,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4574790.0, ans=0.125 2024-08-19 23:34:30,411 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 34 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 23:34:39,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4574790.0, ans=0.125 2024-08-19 23:34:40,160 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2024-08-19 23:34:40,923 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-19 23:34:43,112 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 12900, loss[loss=0.1055, beats_loss=0.01202, ecapa_loss=0.0001168, whisper_loss=0.09234, over 22527.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01038, ecapa_loss=0.0001421, whisper_loss=0.09097, over 3868159.79 frames. ], batch size: 88, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:35:57,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4575190.0, ans=0.1 2024-08-19 23:36:02,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4575190.0, ans=0.1 2024-08-19 23:36:03,426 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 18 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-19 23:36:09,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=4575190.0, ans=0.5 2024-08-19 23:36:12,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4575190.0, ans=0.2 2024-08-19 23:36:25,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4575290.0, ans=0.0 2024-08-19 23:36:34,928 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.306e+01 2.601e+01 3.029e+01 4.481e+01, threshold=5.201e+01, percent-clipped=0.0 2024-08-19 23:36:44,574 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 12950, loss[loss=0.09341, beats_loss=0.01125, ecapa_loss=0.0001515, whisper_loss=0.08064, over 17779.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.000142, whisper_loss=0.09043, over 3858069.14 frames. ], batch size: 72, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:36:53,147 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2024-08-19 23:36:55,526 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.69 vs. limit=15.0 2024-08-19 23:36:57,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4575390.0, ans=0.1 2024-08-19 23:37:52,856 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 23:38:38,035 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 29 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-19 23:38:42,776 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 13000, loss[loss=0.09659, beats_loss=0.01128, ecapa_loss=0.000135, whisper_loss=0.08396, over 18537.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01048, ecapa_loss=0.0001413, whisper_loss=0.09046, over 3841521.06 frames. ], batch size: 77, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:39:07,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4575990.0, ans=0.0 2024-08-19 23:39:18,455 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.26 vs. limit=22.5 2024-08-19 23:39:29,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4576090.0, ans=0.0 2024-08-19 23:39:39,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4576090.0, ans=0.125 2024-08-19 23:39:45,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4576090.0, ans=0.125 2024-08-19 23:40:06,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4576190.0, ans=0.125 2024-08-19 23:40:12,847 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.78 vs. limit=10.0 2024-08-19 23:40:21,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4576290.0, ans=0.2 2024-08-19 23:40:24,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4576290.0, ans=0.125 2024-08-19 23:40:30,069 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.284e+01 2.434e+01 2.790e+01 4.214e+01, threshold=4.868e+01, percent-clipped=0.0 2024-08-19 23:40:38,334 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 13050, loss[loss=0.1023, beats_loss=0.01261, ecapa_loss=0.0001244, whisper_loss=0.08847, over 22439.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01048, ecapa_loss=0.0001418, whisper_loss=0.09072, over 3812121.27 frames. ], batch size: 90, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:40:45,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4576390.0, ans=0.125 2024-08-19 23:40:54,967 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 21 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-19 23:41:18,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4576590.0, ans=0.125 2024-08-19 23:41:28,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4576590.0, ans=0.125 2024-08-19 23:41:35,928 WARNING [optim.py:496] (3/4) Scaling gradients by 0.06296969205141068, model_norm_threshold=48.684600830078125 2024-08-19 23:41:36,093 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.824e+04, grad_sumsq=7.824e+04, orig_rms_sq=1.000e+00 2024-08-19 23:41:48,366 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 24 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-19 23:41:58,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4576690.0, ans=0.5 2024-08-19 23:42:01,283 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 23:42:19,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4576790.0, ans=0.125 2024-08-19 23:42:22,015 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 13100, loss[loss=0.09127, beats_loss=0.01105, ecapa_loss=0.000142, whisper_loss=0.0788, over 18535.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01048, ecapa_loss=0.0001425, whisper_loss=0.08991, over 3769002.33 frames. ], batch size: 71, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:42:30,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=4576890.0, ans=15.0 2024-08-19 23:42:38,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4576890.0, ans=0.125 2024-08-19 23:42:55,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4576990.0, ans=0.125 2024-08-19 23:43:07,505 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-19 23:43:19,987 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.890e-01 2024-08-19 23:43:31,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4577190.0, ans=0.2 2024-08-19 23:43:41,884 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 23 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-19 23:43:47,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4577190.0, ans=0.0 2024-08-19 23:43:51,553 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 18 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-19 23:44:02,424 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.292e+01 2.525e+01 2.909e+01 7.731e+02, threshold=5.050e+01, percent-clipped=3.0 2024-08-19 23:44:10,314 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 13150, loss[loss=0.1138, beats_loss=0.00778, ecapa_loss=0.0001469, whisper_loss=0.1045, over 16473.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01046, ecapa_loss=0.0001427, whisper_loss=0.08972, over 3734934.73 frames. ], batch size: 65, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:44:16,381 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 21 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 23:44:21,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4577390.0, ans=0.125 2024-08-19 23:44:27,307 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.64 vs. limit=15.0 2024-08-19 23:44:27,868 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 18 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 23:44:36,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=4577490.0, ans=15.0 2024-08-19 23:44:43,009 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 25 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-19 23:44:53,324 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.30 vs. limit=22.5 2024-08-19 23:44:56,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4577590.0, ans=0.125 2024-08-19 23:44:56,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4577590.0, ans=0.125 2024-08-19 23:45:18,125 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 14 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-19 23:45:29,039 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 37 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-19 23:45:43,174 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 13200, loss[loss=0.1059, beats_loss=0.008759, ecapa_loss=0.0001967, whisper_loss=0.09519, over 19806.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01046, ecapa_loss=0.0001422, whisper_loss=0.08967, over 3736795.84 frames. ], batch size: 80, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:46:04,151 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-19 23:46:11,458 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.62 vs. limit=12.0 2024-08-19 23:46:29,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4578090.0, ans=0.2 2024-08-19 23:46:39,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4578190.0, ans=0.2 2024-08-19 23:46:41,550 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 35 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-19 23:46:49,532 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 25 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 23:46:52,863 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 14 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-19 23:46:55,075 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 26 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-19 23:47:02,683 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 14 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 23:47:05,506 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.725e+01 2.290e+01 2.445e+01 2.755e+01 3.843e+01, threshold=4.889e+01, percent-clipped=0.0 2024-08-19 23:47:08,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4578290.0, ans=0.125 2024-08-19 23:47:08,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4578290.0, ans=0.07 2024-08-19 23:47:12,579 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 13250, loss[loss=0.126, beats_loss=0.007977, ecapa_loss=0.0001343, whisper_loss=0.1167, over 18317.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01043, ecapa_loss=0.0001419, whisper_loss=0.08983, over 3757871.04 frames. ], batch size: 73, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:47:44,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4578490.0, ans=0.0 2024-08-19 23:47:45,756 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 36 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 23:47:48,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4578490.0, ans=0.2 2024-08-19 23:47:58,498 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 35 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 23:47:58,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4578590.0, ans=0.125 2024-08-19 23:48:02,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=4578590.0, ans=15.0 2024-08-19 23:48:50,766 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 31 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 23:48:51,813 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 13300, loss[loss=0.1076, beats_loss=0.01122, ecapa_loss=0.0001568, whisper_loss=0.0948, over 22347.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01038, ecapa_loss=0.0001417, whisper_loss=0.09064, over 3808251.50 frames. ], batch size: 95, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:49:27,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4579090.0, ans=0.125 2024-08-19 23:49:29,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4579090.0, ans=0.125 2024-08-19 23:49:35,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4579090.0, ans=0.07 2024-08-19 23:49:38,194 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 23:49:39,971 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 23:49:53,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4579190.0, ans=0.1 2024-08-19 23:50:00,698 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 13 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-19 23:50:12,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4579290.0, ans=0.0 2024-08-19 23:50:17,584 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.697e+01 2.270e+01 2.522e+01 2.849e+01 4.114e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-19 23:50:18,030 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 21 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-19 23:50:24,321 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 13350, loss[loss=0.09084, beats_loss=0.01141, ecapa_loss=0.0001445, whisper_loss=0.07799, over 16384.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01036, ecapa_loss=0.0001416, whisper_loss=0.09083, over 3779917.89 frames. ], batch size: 65, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:50:25,276 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 30 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 23:50:27,033 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 21 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 23:50:44,358 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.99 vs. limit=15.0 2024-08-19 23:51:27,583 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 36 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 23:51:29,511 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 25 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-19 23:51:44,099 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 17 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-19 23:51:58,125 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 13400, loss[loss=0.105, beats_loss=0.01133, ecapa_loss=0.000148, whisper_loss=0.09217, over 23612.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01035, ecapa_loss=0.000141, whisper_loss=0.0906, over 3768933.63 frames. ], batch size: 94, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:52:04,084 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 17 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-19 23:52:19,138 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 21 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-19 23:52:40,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4580090.0, ans=0.125 2024-08-19 23:53:23,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4580290.0, ans=0.0 2024-08-19 23:53:25,796 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.574e+01 2.343e+01 2.589e+01 2.932e+01 2.538e+02, threshold=5.179e+01, percent-clipped=4.0 2024-08-19 23:53:26,033 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 18 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-19 23:53:32,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4580390.0, ans=0.0 2024-08-19 23:53:33,090 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 13450, loss[loss=0.1089, beats_loss=0.009013, ecapa_loss=0.0001252, whisper_loss=0.09868, over 23088.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01034, ecapa_loss=0.0001411, whisper_loss=0.09054, over 3742884.63 frames. ], batch size: 87, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:53:38,581 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2024-08-19 23:53:59,317 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 23:54:10,178 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 20 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-19 23:54:11,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4580590.0, ans=0.0 2024-08-19 23:54:14,271 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.02 vs. limit=15.0 2024-08-19 23:54:26,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4580590.0, ans=0.125 2024-08-19 23:54:43,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4580690.0, ans=0.0 2024-08-19 23:54:54,599 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 24 from LS+wenet, 30 from Vox, 41 fro AS 2024-08-19 23:55:03,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4580790.0, ans=0.09899494936611666 2024-08-19 23:55:10,927 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 13500, loss[loss=0.1169, beats_loss=0.01013, ecapa_loss=0.0001393, whisper_loss=0.1054, over 19245.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01035, ecapa_loss=0.0001425, whisper_loss=0.09044, over 3773304.16 frames. ], batch size: 75, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:55:14,862 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-19 23:55:23,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4580890.0, ans=0.2 2024-08-19 23:56:13,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4581190.0, ans=0.1 2024-08-19 23:56:14,709 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 35 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-19 23:56:19,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4581190.0, ans=0.0 2024-08-19 23:56:20,767 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 21 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-19 23:56:33,122 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 24 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-19 23:56:35,049 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 19 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-19 23:56:36,000 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.361e+01 2.616e+01 2.856e+01 5.147e+01, threshold=5.232e+01, percent-clipped=0.0 2024-08-19 23:56:43,175 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 13550, loss[loss=0.1173, beats_loss=0.009134, ecapa_loss=0.0001261, whisper_loss=0.1069, over 19351.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01034, ecapa_loss=0.000142, whisper_loss=0.09051, over 3766452.49 frames. ], batch size: 73, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:56:46,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4581390.0, ans=0.05 2024-08-19 23:56:46,445 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.89 vs. limit=22.5 2024-08-19 23:56:49,176 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 23 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 23:56:56,312 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 16 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-19 23:57:05,357 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 19 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 23:57:36,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4581590.0, ans=0.0 2024-08-19 23:57:37,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4581690.0, ans=0.0 2024-08-19 23:57:39,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4581690.0, ans=0.125 2024-08-19 23:57:59,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4581790.0, ans=0.2 2024-08-19 23:58:17,089 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 13600, loss[loss=0.09189, beats_loss=0.01194, ecapa_loss=0.0001177, whisper_loss=0.07878, over 16187.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01034, ecapa_loss=0.000141, whisper_loss=0.09039, over 3768800.70 frames. ], batch size: 64, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:58:18,834 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 25 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-19 23:58:50,816 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 31 from LS+wenet, 11 from Vox, 41 fro AS 2024-08-19 23:59:08,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4582090.0, ans=0.09899494936611666 2024-08-19 23:59:38,724 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 17 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-19 23:59:39,902 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.691e+01 2.261e+01 2.439e+01 2.757e+01 6.326e+01, threshold=4.878e+01, percent-clipped=1.0 2024-08-19 23:59:47,348 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 13650, loss[loss=0.09865, beats_loss=0.01022, ecapa_loss=0.0001334, whisper_loss=0.08709, over 20940.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0104, ecapa_loss=0.0001418, whisper_loss=0.08945, over 3757500.47 frames. ], batch size: 82, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:59:54,985 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.71 vs. limit=6.0 2024-08-20 00:00:08,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4582490.0, ans=0.125 2024-08-20 00:00:20,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4582490.0, ans=0.125 2024-08-20 00:00:40,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4582590.0, ans=0.125 2024-08-20 00:01:17,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4582790.0, ans=0.2 2024-08-20 00:01:20,260 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 13700, loss[loss=0.09916, beats_loss=0.01076, ecapa_loss=0.0001755, whisper_loss=0.08665, over 17901.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01042, ecapa_loss=0.0001418, whisper_loss=0.08995, over 3802685.57 frames. ], batch size: 75, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:01:24,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4582890.0, ans=0.0 2024-08-20 00:01:27,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4582890.0, ans=0.0 2024-08-20 00:01:43,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4582990.0, ans=0.125 2024-08-20 00:01:49,869 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 16 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-20 00:02:18,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=4583190.0, ans=15.0 2024-08-20 00:02:41,762 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 20 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 00:02:46,869 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.351e+01 2.599e+01 2.817e+01 2.023e+02, threshold=5.198e+01, percent-clipped=1.0 2024-08-20 00:02:53,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4583390.0, ans=0.0 2024-08-20 00:02:54,319 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 13750, loss[loss=0.09989, beats_loss=0.01222, ecapa_loss=0.0001344, whisper_loss=0.08632, over 20517.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01044, ecapa_loss=0.0001407, whisper_loss=0.08932, over 3791320.43 frames. ], batch size: 83, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:03:01,427 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 26 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-20 00:03:03,465 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 31 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 00:03:14,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4583490.0, ans=0.125 2024-08-20 00:03:20,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4583490.0, ans=0.125 2024-08-20 00:03:47,648 INFO [train_multi_KD3.py:845] (3/4) A total of 96 cuts. 23 from LS+wenet, 24 from Vox, 49 fro AS 2024-08-20 00:04:18,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4583790.0, ans=0.0 2024-08-20 00:04:27,659 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 13800, loss[loss=0.1031, beats_loss=0.007648, ecapa_loss=0.0001316, whisper_loss=0.09416, over 17185.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01042, ecapa_loss=0.0001409, whisper_loss=0.08943, over 3786344.27 frames. ], batch size: 64, lr: 1.96e-03, grad_scale: 1.152921504606847e+18 2024-08-20 00:04:35,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4583890.0, ans=0.125 2024-08-20 00:04:45,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4583990.0, ans=0.0 2024-08-20 00:04:46,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4583990.0, ans=0.125 2024-08-20 00:04:56,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4583990.0, ans=0.0 2024-08-20 00:05:03,046 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 19 from LS+wenet, 25 from Vox, 49 fro AS 2024-08-20 00:05:08,012 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 25 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-20 00:05:30,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4584190.0, ans=0.0 2024-08-20 00:05:41,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.43 vs. limit=12.0 2024-08-20 00:05:51,492 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.269e+01 2.538e+01 2.800e+01 5.388e+01, threshold=5.076e+01, percent-clipped=1.0 2024-08-20 00:05:57,940 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 13850, loss[loss=0.1093, beats_loss=0.009971, ecapa_loss=0.0001519, whisper_loss=0.09786, over 14598.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01043, ecapa_loss=0.0001412, whisper_loss=0.08904, over 3769006.25 frames. ], batch size: 56, lr: 1.96e-03, grad_scale: 1.152921504606847e+18 2024-08-20 00:06:17,477 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.98 vs. limit=15.0 2024-08-20 00:06:36,493 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 19 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-20 00:07:09,289 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 26 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 00:07:13,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4584790.0, ans=0.125 2024-08-20 00:07:19,036 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=15.0 2024-08-20 00:07:22,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4584790.0, ans=0.125 2024-08-20 00:07:25,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4584790.0, ans=0.125 2024-08-20 00:07:30,776 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 13900, loss[loss=0.09473, beats_loss=0.008934, ecapa_loss=0.0002157, whisper_loss=0.08364, over 17552.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01048, ecapa_loss=0.0001408, whisper_loss=0.08929, over 3763363.16 frames. ], batch size: 76, lr: 1.96e-03, grad_scale: 1.152921504606847e+18 2024-08-20 00:07:45,225 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 35 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-20 00:07:54,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4584990.0, ans=0.0 2024-08-20 00:08:12,100 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 21 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-20 00:08:14,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4585090.0, ans=0.1 2024-08-20 00:08:24,704 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-20 00:08:30,068 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 00:08:36,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4585190.0, ans=0.125 2024-08-20 00:08:47,957 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.56 vs. limit=22.5 2024-08-20 00:08:56,350 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.301e+01 2.533e+01 2.957e+01 6.862e+01, threshold=5.066e+01, percent-clipped=1.0 2024-08-20 00:08:59,833 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 00:09:03,469 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.93 vs. limit=6.0 2024-08-20 00:09:03,982 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 13950, loss[loss=0.1092, beats_loss=0.009569, ecapa_loss=0.0001921, whisper_loss=0.09774, over 14180.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01049, ecapa_loss=0.0001434, whisper_loss=0.08892, over 3761995.80 frames. ], batch size: 59, lr: 1.96e-03, grad_scale: 1.152921504606847e+18 2024-08-20 00:09:12,529 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.36 vs. limit=15.0 2024-08-20 00:09:27,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4585490.0, ans=0.125 2024-08-20 00:09:34,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4585490.0, ans=0.125 2024-08-20 00:09:54,438 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.14 vs. limit=22.5 2024-08-20 00:09:56,473 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 25 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-20 00:09:57,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4585590.0, ans=0.125 2024-08-20 00:10:01,070 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.06 vs. limit=15.0 2024-08-20 00:10:20,101 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 00:10:21,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4585790.0, ans=0.125 2024-08-20 00:10:28,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4585790.0, ans=0.125 2024-08-20 00:10:29,283 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 37 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-20 00:10:30,099 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.30 vs. limit=15.0 2024-08-20 00:10:40,049 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 14000, loss[loss=0.08665, beats_loss=0.01117, ecapa_loss=0.0001677, whisper_loss=0.0738, over 22162.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01045, ecapa_loss=0.0001429, whisper_loss=0.0894, over 3792876.37 frames. ], batch size: 92, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:11:22,346 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 24 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 00:12:00,549 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 18 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 00:12:00,997 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 00:12:03,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4586290.0, ans=0.0 2024-08-20 00:12:07,575 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.217e+01 2.438e+01 2.736e+01 1.084e+02, threshold=4.877e+01, percent-clipped=1.0 2024-08-20 00:12:15,040 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 14050, loss[loss=0.08834, beats_loss=0.009752, ecapa_loss=0.0001344, whisper_loss=0.07724, over 13383.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01039, ecapa_loss=0.0001427, whisper_loss=0.09008, over 3790847.08 frames. ], batch size: 52, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:12:20,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4586390.0, ans=0.125 2024-08-20 00:12:28,159 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 20 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-20 00:12:54,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4586590.0, ans=0.125 2024-08-20 00:13:05,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4586590.0, ans=0.0 2024-08-20 00:13:32,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4586790.0, ans=0.05 2024-08-20 00:13:46,538 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 30 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 00:13:49,163 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 14100, loss[loss=0.1033, beats_loss=0.01122, ecapa_loss=0.000102, whisper_loss=0.09103, over 16205.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01047, ecapa_loss=0.000141, whisper_loss=0.08986, over 3844801.68 frames. ], batch size: 62, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:13:52,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4586890.0, ans=0.0 2024-08-20 00:13:52,559 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2024-08-20 00:13:58,856 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 18 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-20 00:14:04,997 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 00:14:08,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4586990.0, ans=0.2 2024-08-20 00:14:15,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4586990.0, ans=0.125 2024-08-20 00:14:18,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4586990.0, ans=0.125 2024-08-20 00:14:27,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4587090.0, ans=0.0 2024-08-20 00:14:39,790 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.84 vs. limit=15.0 2024-08-20 00:14:40,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4587090.0, ans=0.0 2024-08-20 00:14:42,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4587090.0, ans=0.1 2024-08-20 00:15:04,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4587290.0, ans=0.0 2024-08-20 00:15:18,603 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.279e+01 2.555e+01 2.827e+01 5.250e+01, threshold=5.111e+01, percent-clipped=1.0 2024-08-20 00:15:24,382 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 14150, loss[loss=0.1107, beats_loss=0.009073, ecapa_loss=0.0001392, whisper_loss=0.1002, over 18057.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01035, ecapa_loss=0.0001416, whisper_loss=0.09059, over 3820129.74 frames. ], batch size: 71, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:15:35,120 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 21 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-20 00:15:41,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4587490.0, ans=0.0 2024-08-20 00:15:55,441 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 27 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-20 00:16:18,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4587590.0, ans=0.2 2024-08-20 00:16:20,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4587590.0, ans=0.0 2024-08-20 00:16:28,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4587690.0, ans=0.125 2024-08-20 00:16:30,748 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.01 vs. limit=12.0 2024-08-20 00:16:42,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4587790.0, ans=0.0 2024-08-20 00:16:59,917 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 14200, loss[loss=0.1157, beats_loss=0.01005, ecapa_loss=0.0001637, whisper_loss=0.104, over 20212.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01034, ecapa_loss=0.0001412, whisper_loss=0.0907, over 3799951.49 frames. ], batch size: 82, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:17:01,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4587890.0, ans=0.125 2024-08-20 00:17:02,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4587890.0, ans=0.125 2024-08-20 00:17:05,761 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-20 00:17:18,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4587990.0, ans=0.0 2024-08-20 00:17:26,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4587990.0, ans=0.1 2024-08-20 00:17:31,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4587990.0, ans=0.0 2024-08-20 00:17:35,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4588090.0, ans=0.125 2024-08-20 00:18:12,455 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 00:18:27,136 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.209e+01 2.487e+01 2.835e+01 4.985e+01, threshold=4.974e+01, percent-clipped=0.0 2024-08-20 00:18:33,066 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 14250, loss[loss=0.1125, beats_loss=0.009244, ecapa_loss=0.0001471, whisper_loss=0.1018, over 23181.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0103, ecapa_loss=0.0001411, whisper_loss=0.09085, over 3825656.08 frames. ], batch size: 92, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:18:54,590 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 34 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 00:19:04,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4588490.0, ans=0.2 2024-08-20 00:19:16,519 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 00:20:06,757 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 14300, loss[loss=0.08217, beats_loss=0.01255, ecapa_loss=0.000123, whisper_loss=0.06839, over 13923.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.0001405, whisper_loss=0.09046, over 3767797.21 frames. ], batch size: 56, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:20:40,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4588990.0, ans=0.125 2024-08-20 00:20:50,831 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 16 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 00:21:01,865 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.58 vs. limit=15.0 2024-08-20 00:21:06,075 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 39 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 00:21:09,780 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 25 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-20 00:21:35,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4589290.0, ans=0.2 2024-08-20 00:21:36,003 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.288e+01 2.504e+01 2.843e+01 5.964e+01, threshold=5.008e+01, percent-clipped=1.0 2024-08-20 00:21:42,167 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 14350, loss[loss=0.08689, beats_loss=0.01324, ecapa_loss=0.0001187, whisper_loss=0.07246, over 19648.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01033, ecapa_loss=0.00014, whisper_loss=0.09099, over 3763984.07 frames. ], batch size: 79, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:21:42,424 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 30 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 00:21:43,668 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2024-08-20 00:22:06,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4589490.0, ans=0.07 2024-08-20 00:22:06,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4589490.0, ans=0.1 2024-08-20 00:22:46,286 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 20 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-20 00:22:52,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4589690.0, ans=0.0 2024-08-20 00:22:54,468 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 00:23:08,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4589790.0, ans=0.1 2024-08-20 00:23:16,990 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 14400, loss[loss=0.08895, beats_loss=0.01082, ecapa_loss=0.0001626, whisper_loss=0.07651, over 20961.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01038, ecapa_loss=0.0001396, whisper_loss=0.09016, over 3767074.13 frames. ], batch size: 88, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:23:35,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4589990.0, ans=0.0 2024-08-20 00:23:48,545 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 19 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-20 00:24:41,312 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.268e+01 2.508e+01 2.742e+01 3.367e+01, threshold=5.015e+01, percent-clipped=0.0 2024-08-20 00:24:48,106 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 14450, loss[loss=0.08346, beats_loss=0.01309, ecapa_loss=0.0001184, whisper_loss=0.06918, over 22072.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01046, ecapa_loss=0.000139, whisper_loss=0.08945, over 3760447.80 frames. ], batch size: 91, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:24:49,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4590390.0, ans=0.125 2024-08-20 00:24:53,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4590390.0, ans=0.0 2024-08-20 00:24:53,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4590390.0, ans=0.125 2024-08-20 00:24:54,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4590390.0, ans=0.125 2024-08-20 00:24:54,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4590390.0, ans=0.125 2024-08-20 00:25:16,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=4590490.0, ans=0.2 2024-08-20 00:25:37,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4590590.0, ans=0.1 2024-08-20 00:25:43,753 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 00:25:51,239 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 00:25:56,389 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 32 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 00:26:24,561 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 14500, loss[loss=0.08994, beats_loss=0.0106, ecapa_loss=0.0001187, whisper_loss=0.07816, over 19865.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01047, ecapa_loss=0.0001397, whisper_loss=0.08966, over 3805853.51 frames. ], batch size: 75, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:26:32,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4590890.0, ans=0.125 2024-08-20 00:27:02,403 WARNING [optim.py:496] (3/4) Scaling gradients by 0.034884583204984665, model_norm_threshold=50.15473556518555 2024-08-20 00:27:02,606 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.43, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.848e+05, grad_sumsq=8.328e+07, orig_rms_sq=1.062e-02 2024-08-20 00:27:04,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4591090.0, ans=0.0 2024-08-20 00:27:26,669 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 00:27:30,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4591190.0, ans=0.125 2024-08-20 00:27:32,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4591190.0, ans=0.0 2024-08-20 00:27:52,609 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.301e+01 2.496e+01 2.802e+01 1.438e+03, threshold=4.992e+01, percent-clipped=1.0 2024-08-20 00:27:59,130 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 14550, loss[loss=0.1149, beats_loss=0.009672, ecapa_loss=0.0001267, whisper_loss=0.104, over 15757.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01035, ecapa_loss=0.0001396, whisper_loss=0.09031, over 3831869.87 frames. ], batch size: 61, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:28:01,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4591390.0, ans=0.05 2024-08-20 00:28:02,369 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 17 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-20 00:28:22,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4591490.0, ans=0.05 2024-08-20 00:28:26,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4591490.0, ans=0.1 2024-08-20 00:28:28,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4591490.0, ans=0.05 2024-08-20 00:28:42,412 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 22 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-20 00:29:14,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4591790.0, ans=0.05 2024-08-20 00:29:15,111 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.84 vs. limit=10.0 2024-08-20 00:29:26,934 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 15 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-20 00:29:33,056 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 14600, loss[loss=0.09652, beats_loss=0.01274, ecapa_loss=0.0001372, whisper_loss=0.08242, over 20707.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01042, ecapa_loss=0.0001383, whisper_loss=0.09024, over 3851319.06 frames. ], batch size: 87, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:29:43,395 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 33 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-20 00:29:46,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4591890.0, ans=0.125 2024-08-20 00:29:49,678 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 24 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-20 00:30:16,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4592090.0, ans=0.0 2024-08-20 00:30:20,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4592090.0, ans=0.125 2024-08-20 00:30:38,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4592190.0, ans=0.125 2024-08-20 00:30:40,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4592190.0, ans=0.125 2024-08-20 00:30:47,502 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 16 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-20 00:31:02,118 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.406e+01 2.621e+01 2.917e+01 4.385e+01, threshold=5.242e+01, percent-clipped=0.0 2024-08-20 00:31:05,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4592290.0, ans=0.1 2024-08-20 00:31:07,361 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 14650, loss[loss=0.1212, beats_loss=0.00865, ecapa_loss=0.000124, whisper_loss=0.1113, over 22373.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.0001397, whisper_loss=0.09035, over 3824613.11 frames. ], batch size: 84, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:31:13,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4592390.0, ans=0.125 2024-08-20 00:31:16,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4592390.0, ans=0.0 2024-08-20 00:31:28,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4592490.0, ans=0.125 2024-08-20 00:32:03,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.31 vs. limit=15.0 2024-08-20 00:32:11,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4592690.0, ans=0.125 2024-08-20 00:32:20,586 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-20 00:32:34,788 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 20 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-20 00:32:40,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4592890.0, ans=0.2 2024-08-20 00:32:41,433 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 14700, loss[loss=0.09194, beats_loss=0.01165, ecapa_loss=0.000165, whisper_loss=0.07864, over 21743.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01045, ecapa_loss=0.0001399, whisper_loss=0.09008, over 3847290.24 frames. ], batch size: 91, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:33:02,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4592990.0, ans=0.125 2024-08-20 00:33:10,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4592990.0, ans=0.125 2024-08-20 00:33:31,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4593090.0, ans=0.0 2024-08-20 00:33:35,395 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 20 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-20 00:33:36,798 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 35 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 00:33:44,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4593190.0, ans=0.0 2024-08-20 00:34:05,626 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 24 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-20 00:34:12,265 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.392e+01 2.545e+01 2.884e+01 3.743e+01, threshold=5.090e+01, percent-clipped=0.0 2024-08-20 00:34:17,498 INFO [train_multi_KD3.py:1117] (3/4) Epoch 31, batch 14750, loss[loss=0.1075, beats_loss=0.01138, ecapa_loss=0.0001133, whisper_loss=0.095, over 23801.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01049, ecapa_loss=0.0001397, whisper_loss=0.09016, over 3864740.77 frames. ], batch size: 91, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:34:30,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4593390.0, ans=0.125 2024-08-20 00:34:32,449 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2024-08-20 00:34:41,194 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.16 vs. limit=22.5 2024-08-20 00:34:50,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4593490.0, ans=0.0 2024-08-20 00:34:50,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4593490.0, ans=0.05 2024-08-20 00:34:50,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4593490.0, ans=0.0 2024-08-20 00:35:13,195 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 00:35:27,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4593690.0, ans=0.125 2024-08-20 00:36:08,657 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 0, loss[loss=0.1289, beats_loss=0.007212, ecapa_loss=0.0001593, whisper_loss=0.1201, over 23040.00 frames. ], tot_loss[loss=0.1289, beats_loss=0.007212, ecapa_loss=0.0001593, whisper_loss=0.1201, over 23040.00 frames. ], batch size: 88, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:36:08,657 INFO [train_multi_KD3.py:1140] (3/4) Computing validation loss 2024-08-20 00:36:43,410 INFO [train_multi_KD3.py:1150] (3/4) Epoch 32, validation on ASR_libri: loss=0.2539, beats_loss=0, ecapa_loss=0.0005131, whisper_loss=0.2488, over 931116.00 frames. 2024-08-20 00:37:05,792 INFO [train_multi_KD3.py:1150] (3/4) Epoch 32, validation on SV_voxceleb1: loss=0.004, beats_loss=0, ecapa_loss=0.0004, whisper_loss=0, over 944235.00 frames. 2024-08-20 00:38:06,247 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.7883, 1.6535, 1.8016, 1.2575, 1.4552, 1.9826, 2.3220, 1.5394], device='cuda:3') 2024-08-20 00:38:39,971 INFO [train_multi_KD3.py:1150] (3/4) Epoch 32, validation on AT_audioset: loss=0.02299, beats_loss=0.02299, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 00:38:39,973 INFO [train_multi_KD3.py:1156] (3/4) Maximum memory allocated so far is 32606MB 2024-08-20 00:39:06,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4593900.0, ans=0.125 2024-08-20 00:39:08,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4593900.0, ans=0.0 2024-08-20 00:39:08,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4593900.0, ans=0.125 2024-08-20 00:39:17,579 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 14 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-20 00:39:32,584 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 26 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-20 00:39:39,508 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.67 vs. limit=15.0 2024-08-20 00:39:48,954 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.82 vs. limit=22.5 2024-08-20 00:40:04,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4594100.0, ans=0.125 2024-08-20 00:40:07,707 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.02 vs. limit=10.0 2024-08-20 00:40:30,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4594200.0, ans=0.2 2024-08-20 00:40:40,591 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 50, loss[loss=0.06132, beats_loss=0.009351, ecapa_loss=0.0001741, whisper_loss=0.05023, over 18662.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.009109, ecapa_loss=0.0001398, whisper_loss=0.09116, over 882218.59 frames. ], batch size: 81, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:40:48,209 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.87 vs. limit=12.0 2024-08-20 00:40:50,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4594300.0, ans=0.1 2024-08-20 00:40:55,071 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.478e+01 2.729e+01 3.043e+01 3.966e+01, threshold=5.458e+01, percent-clipped=0.0 2024-08-20 00:40:57,294 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.08 vs. limit=15.0 2024-08-20 00:41:26,353 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 20 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-20 00:41:43,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4594500.0, ans=0.125 2024-08-20 00:41:46,454 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.07 vs. limit=15.0 2024-08-20 00:42:01,838 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2024-08-20 00:42:09,130 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 21 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-20 00:42:26,399 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 00:42:38,544 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 100, loss[loss=0.1174, beats_loss=0.007924, ecapa_loss=0.0001472, whisper_loss=0.108, over 19533.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.009074, ecapa_loss=0.0001415, whisper_loss=0.09015, over 1534051.20 frames. ], batch size: 76, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:42:43,699 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 00:42:51,388 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 14 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-20 00:42:55,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4594800.0, ans=0.2 2024-08-20 00:43:09,850 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-20 00:43:20,945 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.56 vs. limit=15.0 2024-08-20 00:43:21,422 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 22 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 00:43:23,866 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 22 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-20 00:44:06,404 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 18 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-20 00:44:12,560 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 00:44:20,325 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 14 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-20 00:44:22,818 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 32 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-20 00:44:35,550 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 30 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-20 00:44:37,716 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 150, loss[loss=0.1033, beats_loss=0.01003, ecapa_loss=0.0001315, whisper_loss=0.09196, over 23522.00 frames. ], tot_loss[loss=0.101, beats_loss=0.009211, ecapa_loss=0.0001421, whisper_loss=0.0904, over 2017074.50 frames. ], batch size: 92, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:44:50,016 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.031e+01 2.529e+01 2.741e+01 3.091e+01 3.915e+01, threshold=5.483e+01, percent-clipped=0.0 2024-08-20 00:44:50,262 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 20 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 00:45:16,672 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 35 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-20 00:45:27,065 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=15.0 2024-08-20 00:45:30,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4595500.0, ans=0.0 2024-08-20 00:45:45,010 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 29 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 00:45:46,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4595600.0, ans=0.125 2024-08-20 00:45:46,783 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.46 vs. limit=15.0 2024-08-20 00:46:04,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4595700.0, ans=0.125 2024-08-20 00:46:12,734 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 200, loss[loss=0.1004, beats_loss=0.01268, ecapa_loss=0.0001123, whisper_loss=0.08657, over 16413.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.00965, ecapa_loss=0.0001411, whisper_loss=0.08935, over 2422201.31 frames. ], batch size: 64, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:46:36,481 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.89 vs. limit=6.0 2024-08-20 00:46:41,030 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 22 from LS+wenet, 16 from Vox, 13 fro AS 2024-08-20 00:46:45,629 WARNING [optim.py:496] (3/4) Scaling gradients by 0.03673094883561134, model_norm_threshold=54.82755661010742 2024-08-20 00:46:45,798 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.205e+05, grad_sumsq=3.205e+05, orig_rms_sq=1.000e+00 2024-08-20 00:46:59,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4596000.0, ans=0.05 2024-08-20 00:47:31,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4596200.0, ans=0.125 2024-08-20 00:47:34,613 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 21 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-20 00:47:43,271 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 250, loss[loss=0.1123, beats_loss=0.007737, ecapa_loss=0.0001283, whisper_loss=0.1032, over 22986.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.00985, ecapa_loss=0.000142, whisper_loss=0.08951, over 2700482.24 frames. ], batch size: 84, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:47:50,332 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 22 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-20 00:47:53,497 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.311e+01 2.593e+01 2.981e+01 1.493e+03, threshold=5.185e+01, percent-clipped=1.0 2024-08-20 00:48:13,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4596400.0, ans=0.0 2024-08-20 00:48:29,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4596500.0, ans=0.0 2024-08-20 00:48:43,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4596600.0, ans=0.0 2024-08-20 00:49:00,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4596700.0, ans=0.125 2024-08-20 00:49:08,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4596800.0, ans=0.0 2024-08-20 00:49:09,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4596800.0, ans=0.05 2024-08-20 00:49:09,728 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 300, loss[loss=0.1003, beats_loss=0.01178, ecapa_loss=0.000137, whisper_loss=0.08717, over 19500.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.009954, ecapa_loss=0.0001415, whisper_loss=0.08947, over 2904252.16 frames. ], batch size: 78, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:49:25,996 WARNING [optim.py:496] (3/4) Scaling gradients by 0.03387049213051796, model_norm_threshold=51.854286193847656 2024-08-20 00:49:26,163 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.005e+05, grad_sumsq=9.116e+04, orig_rms_sq=3.297e+00 2024-08-20 00:49:31,813 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 18 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-20 00:50:14,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4597100.0, ans=0.035 2024-08-20 00:50:14,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4597100.0, ans=0.0 2024-08-20 00:50:16,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4597100.0, ans=0.0 2024-08-20 00:50:24,987 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 22 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-20 00:50:37,091 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 350, loss[loss=0.1053, beats_loss=0.01171, ecapa_loss=0.0001243, whisper_loss=0.0923, over 22571.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01012, ecapa_loss=0.0001423, whisper_loss=0.0888, over 3084538.18 frames. ], batch size: 90, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:50:39,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4597300.0, ans=0.0 2024-08-20 00:50:47,559 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2024-08-20 00:50:48,272 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.225e+01 2.468e+01 2.778e+01 1.531e+03, threshold=4.937e+01, percent-clipped=2.0 2024-08-20 00:50:52,381 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 17 from LS+wenet, 8 from Vox, 27 fro AS 2024-08-20 00:51:01,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4597400.0, ans=0.2 2024-08-20 00:51:31,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4597600.0, ans=0.125 2024-08-20 00:51:35,531 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 17 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-20 00:51:53,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4597700.0, ans=0.09899494936611666 2024-08-20 00:52:04,909 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 400, loss[loss=0.07819, beats_loss=0.01195, ecapa_loss=0.0001215, whisper_loss=0.06502, over 16192.00 frames. ], tot_loss[loss=0.09903, beats_loss=0.01021, ecapa_loss=0.0001426, whisper_loss=0.08739, over 3209995.48 frames. ], batch size: 64, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:52:24,033 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.53 vs. limit=10.0 2024-08-20 00:52:25,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4597900.0, ans=0.125 2024-08-20 00:52:38,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4597900.0, ans=0.125 2024-08-20 00:53:21,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4598200.0, ans=0.125 2024-08-20 00:53:22,896 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 13 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-20 00:53:34,735 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.70 vs. limit=10.0 2024-08-20 00:53:35,342 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 450, loss[loss=0.1119, beats_loss=0.008232, ecapa_loss=0.0001347, whisper_loss=0.1023, over 18121.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01021, ecapa_loss=0.0001409, whisper_loss=0.08863, over 3365390.96 frames. ], batch size: 68, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:53:35,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4598300.0, ans=0.125 2024-08-20 00:53:43,215 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.32 vs. limit=10.0 2024-08-20 00:53:44,216 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 22 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 00:53:45,686 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.300e+01 2.526e+01 2.780e+01 3.592e+01, threshold=5.052e+01, percent-clipped=0.0 2024-08-20 00:53:58,725 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.63 vs. limit=15.0 2024-08-20 00:54:07,918 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 31 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-20 00:54:11,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4598500.0, ans=0.0 2024-08-20 00:54:29,917 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 00:54:34,188 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.95 vs. limit=22.5 2024-08-20 00:54:52,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4598700.0, ans=0.1 2024-08-20 00:54:54,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4598700.0, ans=0.5 2024-08-20 00:55:01,781 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 500, loss[loss=0.1077, beats_loss=0.01001, ecapa_loss=0.000135, whisper_loss=0.09637, over 22419.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.0102, ecapa_loss=0.0001405, whisper_loss=0.08879, over 3437772.74 frames. ], batch size: 89, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:55:06,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4598800.0, ans=0.125 2024-08-20 00:55:08,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4598800.0, ans=0.125 2024-08-20 00:55:33,891 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=22.5 2024-08-20 00:55:39,846 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 20 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-20 00:55:41,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4599000.0, ans=0.2 2024-08-20 00:56:06,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4599100.0, ans=0.125 2024-08-20 00:56:31,183 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 550, loss[loss=0.1048, beats_loss=0.009735, ecapa_loss=0.0001431, whisper_loss=0.09368, over 24374.00 frames. ], tot_loss[loss=0.09984, beats_loss=0.01025, ecapa_loss=0.0001402, whisper_loss=0.08818, over 3476126.70 frames. ], batch size: 98, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:56:33,615 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.38 vs. limit=22.5 2024-08-20 00:56:41,835 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.734e+01 2.287e+01 2.466e+01 2.719e+01 4.330e+01, threshold=4.932e+01, percent-clipped=0.0 2024-08-20 00:56:53,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4599400.0, ans=0.1 2024-08-20 00:56:59,416 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 25 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-20 00:57:06,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4599500.0, ans=0.0 2024-08-20 00:57:12,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4599500.0, ans=0.0 2024-08-20 00:57:17,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4599500.0, ans=0.125 2024-08-20 00:57:32,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4599600.0, ans=0.1 2024-08-20 00:57:48,665 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.40 vs. limit=22.5 2024-08-20 00:57:53,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=4599700.0, ans=0.1 2024-08-20 00:58:03,841 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 600, loss[loss=0.1011, beats_loss=0.009437, ecapa_loss=0.0001279, whisper_loss=0.09041, over 18862.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01014, ecapa_loss=0.000141, whisper_loss=0.08947, over 3528203.47 frames. ], batch size: 77, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:58:13,797 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 00:58:23,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4599900.0, ans=0.0 2024-08-20 00:58:25,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4599900.0, ans=0.125 2024-08-20 00:58:50,616 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-20 00:59:10,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4600100.0, ans=0.125 2024-08-20 00:59:17,575 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 31 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 00:59:26,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4600200.0, ans=0.125 2024-08-20 00:59:35,292 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 650, loss[loss=0.06085, beats_loss=0.01213, ecapa_loss=0.0001222, whisper_loss=0.04749, over 12760.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01019, ecapa_loss=0.0001399, whisper_loss=0.08999, over 3603213.36 frames. ], batch size: 49, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:59:42,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4600300.0, ans=0.125 2024-08-20 00:59:43,503 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 16 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-20 00:59:46,514 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.232e+01 2.532e+01 2.844e+01 3.570e+02, threshold=5.065e+01, percent-clipped=2.0 2024-08-20 00:59:53,521 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 23 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-20 01:00:06,087 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=12.0 2024-08-20 01:00:20,700 WARNING [optim.py:496] (3/4) Scaling gradients by 0.07462587207555771, model_norm_threshold=50.64724349975586 2024-08-20 01:00:20,867 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.123e+04, grad_sumsq=9.123e+04, orig_rms_sq=1.000e+00 2024-08-20 01:00:42,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4600600.0, ans=0.1 2024-08-20 01:00:55,569 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.38 vs. limit=10.0 2024-08-20 01:01:05,205 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 700, loss[loss=0.1086, beats_loss=0.01057, ecapa_loss=0.0001377, whisper_loss=0.0967, over 22199.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01026, ecapa_loss=0.00014, whisper_loss=0.0903, over 3646489.00 frames. ], batch size: 90, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 01:01:08,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4600800.0, ans=0.125 2024-08-20 01:01:26,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4600900.0, ans=0.125 2024-08-20 01:01:26,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4600900.0, ans=0.125 2024-08-20 01:01:32,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4600900.0, ans=0.125 2024-08-20 01:01:46,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4601000.0, ans=0.0 2024-08-20 01:01:48,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4601000.0, ans=0.1 2024-08-20 01:02:00,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4601100.0, ans=0.125 2024-08-20 01:02:34,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4601300.0, ans=0.0 2024-08-20 01:02:34,861 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 750, loss[loss=0.1049, beats_loss=0.01061, ecapa_loss=0.000133, whisper_loss=0.09301, over 23154.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01022, ecapa_loss=0.0001396, whisper_loss=0.09012, over 3663784.80 frames. ], batch size: 90, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 01:02:45,507 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.348e+01 2.626e+01 2.965e+01 6.787e+02, threshold=5.252e+01, percent-clipped=3.0 2024-08-20 01:02:53,176 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 18 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-20 01:02:53,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.55 vs. limit=22.5 2024-08-20 01:03:05,317 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.67 vs. limit=12.0 2024-08-20 01:03:22,113 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 19 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 01:03:22,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4601500.0, ans=0.125 2024-08-20 01:03:26,804 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-20 01:03:42,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4601700.0, ans=0.0 2024-08-20 01:04:00,102 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 800, loss[loss=0.09079, beats_loss=0.01028, ecapa_loss=0.000121, whisper_loss=0.0793, over 22939.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01024, ecapa_loss=0.0001388, whisper_loss=0.08963, over 3690567.08 frames. ], batch size: 89, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 01:04:19,307 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.83 vs. limit=6.0 2024-08-20 01:04:21,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4601900.0, ans=0.125 2024-08-20 01:04:23,507 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 13 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 01:04:28,850 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.01 vs. limit=15.0 2024-08-20 01:04:47,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4602000.0, ans=0.125 2024-08-20 01:04:51,539 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 30 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-20 01:04:53,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4602100.0, ans=0.0 2024-08-20 01:05:00,299 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 01:05:05,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4602100.0, ans=0.125 2024-08-20 01:05:08,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4602200.0, ans=0.0 2024-08-20 01:05:13,614 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 15 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 01:05:19,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4602200.0, ans=0.125 2024-08-20 01:05:25,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4602300.0, ans=0.2 2024-08-20 01:05:26,417 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 850, loss[loss=0.0925, beats_loss=0.01301, ecapa_loss=0.0001313, whisper_loss=0.07817, over 16212.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01022, ecapa_loss=0.0001386, whisper_loss=0.08968, over 3700041.61 frames. ], batch size: 66, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 01:05:37,197 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.270e+01 2.498e+01 2.868e+01 4.208e+01, threshold=4.997e+01, percent-clipped=0.0 2024-08-20 01:05:41,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4602300.0, ans=0.125 2024-08-20 01:05:44,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4602400.0, ans=0.0 2024-08-20 01:05:46,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4602400.0, ans=10.0 2024-08-20 01:06:07,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4602500.0, ans=0.0 2024-08-20 01:06:07,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4602500.0, ans=0.07 2024-08-20 01:06:31,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4602600.0, ans=0.125 2024-08-20 01:06:40,325 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.22 vs. limit=10.0 2024-08-20 01:06:41,917 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-08-20 01:06:49,966 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 28 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-20 01:06:53,212 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 900, loss[loss=0.1119, beats_loss=0.0101, ecapa_loss=0.0001412, whisper_loss=0.1004, over 21255.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01018, ecapa_loss=0.0001388, whisper_loss=0.08999, over 3726926.44 frames. ], batch size: 83, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 01:06:57,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4602800.0, ans=0.125 2024-08-20 01:07:07,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4602800.0, ans=0.0 2024-08-20 01:07:07,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4602800.0, ans=0.125 2024-08-20 01:07:20,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4602900.0, ans=0.125 2024-08-20 01:07:42,355 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.52 vs. limit=15.0 2024-08-20 01:08:00,174 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.12 vs. limit=15.0 2024-08-20 01:08:01,836 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-20 01:08:19,548 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 950, loss[loss=0.0973, beats_loss=0.007507, ecapa_loss=0.0001902, whisper_loss=0.08789, over 12586.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01023, ecapa_loss=0.0001378, whisper_loss=0.08988, over 3736620.27 frames. ], batch size: 51, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:08:20,184 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-20 01:08:22,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4603300.0, ans=0.0 2024-08-20 01:08:27,529 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 27 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-20 01:08:29,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4603300.0, ans=0.125 2024-08-20 01:08:31,791 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.373e+01 2.705e+01 3.029e+01 3.919e+02, threshold=5.410e+01, percent-clipped=3.0 2024-08-20 01:08:49,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4603400.0, ans=0.0 2024-08-20 01:08:53,445 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=8.858e-01 2024-08-20 01:09:05,573 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-08-20 01:09:07,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4603500.0, ans=0.125 2024-08-20 01:09:24,717 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2024-08-20 01:09:31,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4603700.0, ans=0.0 2024-08-20 01:09:46,130 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 1000, loss[loss=0.07402, beats_loss=0.009167, ecapa_loss=0.0001435, whisper_loss=0.06342, over 14179.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01027, ecapa_loss=0.0001375, whisper_loss=0.08923, over 3717511.52 frames. ], batch size: 57, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:10:23,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4604000.0, ans=0.0 2024-08-20 01:11:01,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4604200.0, ans=0.0 2024-08-20 01:11:10,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4604200.0, ans=0.0 2024-08-20 01:11:18,161 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.67 vs. limit=10.0 2024-08-20 01:11:18,661 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 1050, loss[loss=0.1129, beats_loss=0.008019, ecapa_loss=0.000129, whisper_loss=0.1035, over 14426.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01026, ecapa_loss=0.0001381, whisper_loss=0.08879, over 3705309.92 frames. ], batch size: 53, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:11:24,251 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.41 vs. limit=15.0 2024-08-20 01:11:29,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4604300.0, ans=0.0 2024-08-20 01:11:31,622 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.25 vs. limit=15.0 2024-08-20 01:11:32,004 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.222e+01 2.426e+01 2.735e+01 4.130e+01, threshold=4.852e+01, percent-clipped=0.0 2024-08-20 01:11:33,179 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 23 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 01:12:03,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4604500.0, ans=0.125 2024-08-20 01:12:04,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4604500.0, ans=0.125 2024-08-20 01:12:16,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4604600.0, ans=0.125 2024-08-20 01:12:24,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4604600.0, ans=0.125 2024-08-20 01:12:27,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4604600.0, ans=0.04949747468305833 2024-08-20 01:12:28,765 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 16 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-20 01:12:29,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4604600.0, ans=0.0 2024-08-20 01:12:35,895 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.112e-01 2024-08-20 01:12:43,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=4604700.0, ans=0.025 2024-08-20 01:12:49,411 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 1100, loss[loss=0.09555, beats_loss=0.01244, ecapa_loss=0.0001405, whisper_loss=0.08171, over 20385.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01016, ecapa_loss=0.0001389, whisper_loss=0.08925, over 3704007.79 frames. ], batch size: 84, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:13:57,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4605200.0, ans=0.1 2024-08-20 01:13:58,028 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=15.0 2024-08-20 01:14:15,504 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 1150, loss[loss=0.1087, beats_loss=0.006016, ecapa_loss=0.0001847, whisper_loss=0.1008, over 15375.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01016, ecapa_loss=0.0001386, whisper_loss=0.08967, over 3712745.18 frames. ], batch size: 55, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:14:25,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4605300.0, ans=0.125 2024-08-20 01:14:27,511 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.314e+01 2.565e+01 2.766e+01 1.499e+02, threshold=5.130e+01, percent-clipped=2.0 2024-08-20 01:14:35,039 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 24 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-20 01:14:36,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4605400.0, ans=0.5 2024-08-20 01:14:42,139 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.95 vs. limit=6.0 2024-08-20 01:14:43,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4605400.0, ans=0.0 2024-08-20 01:14:46,921 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 25 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-20 01:15:00,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4605500.0, ans=0.125 2024-08-20 01:15:01,177 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2024-08-20 01:15:12,561 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-08-20 01:15:20,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4605600.0, ans=0.035 2024-08-20 01:15:30,369 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 20 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-20 01:15:35,677 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-20 01:15:40,913 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 1200, loss[loss=0.08324, beats_loss=0.01125, ecapa_loss=0.0001268, whisper_loss=0.07072, over 17255.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01027, ecapa_loss=0.000138, whisper_loss=0.08943, over 3723597.96 frames. ], batch size: 68, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:16:03,000 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-08-20 01:16:35,538 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.32 vs. limit=15.0 2024-08-20 01:16:40,559 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 29 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-20 01:16:52,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4606100.0, ans=0.125 2024-08-20 01:16:53,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4606200.0, ans=0.125 2024-08-20 01:17:10,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4606200.0, ans=0.0 2024-08-20 01:17:15,238 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 1250, loss[loss=0.0853, beats_loss=0.01143, ecapa_loss=0.0001193, whisper_loss=0.07268, over 19425.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.0104, ecapa_loss=0.000137, whisper_loss=0.08839, over 3732455.44 frames. ], batch size: 78, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:17:17,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4606300.0, ans=0.125 2024-08-20 01:17:27,930 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 01:17:32,637 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.713e+01 2.240e+01 2.537e+01 2.870e+01 6.660e+01, threshold=5.073e+01, percent-clipped=2.0 2024-08-20 01:17:52,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4606400.0, ans=0.125 2024-08-20 01:18:17,893 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.61 vs. limit=12.0 2024-08-20 01:18:59,780 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.69 vs. limit=15.0 2024-08-20 01:19:13,296 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 1300, loss[loss=0.08296, beats_loss=0.01191, ecapa_loss=0.0001365, whisper_loss=0.06969, over 16728.00 frames. ], tot_loss[loss=0.09986, beats_loss=0.01044, ecapa_loss=0.000137, whisper_loss=0.08805, over 3726327.68 frames. ], batch size: 67, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:19:18,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4606800.0, ans=0.0 2024-08-20 01:19:19,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4606800.0, ans=0.125 2024-08-20 01:19:31,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4606900.0, ans=0.1 2024-08-20 01:19:32,084 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.81 vs. limit=10.0 2024-08-20 01:19:39,061 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 20 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-20 01:19:43,122 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 25 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 01:19:44,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4606900.0, ans=0.0 2024-08-20 01:19:48,867 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 24 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 01:19:59,998 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 27 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-20 01:20:39,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4607200.0, ans=0.0 2024-08-20 01:20:53,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4607200.0, ans=0.0 2024-08-20 01:20:56,605 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.49 vs. limit=10.0 2024-08-20 01:21:03,546 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 1350, loss[loss=0.114, beats_loss=0.00856, ecapa_loss=0.0001577, whisper_loss=0.1039, over 15479.00 frames. ], tot_loss[loss=0.09989, beats_loss=0.01042, ecapa_loss=0.0001382, whisper_loss=0.08809, over 3735083.61 frames. ], batch size: 61, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:21:17,548 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 23 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 01:21:22,313 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.244e+01 2.406e+01 2.687e+01 4.080e+01, threshold=4.812e+01, percent-clipped=0.0 2024-08-20 01:22:01,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4607500.0, ans=0.0 2024-08-20 01:22:04,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4607500.0, ans=0.0 2024-08-20 01:22:05,837 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 27 from LS+wenet, 34 from Vox, 34 fro AS 2024-08-20 01:22:21,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4607600.0, ans=0.125 2024-08-20 01:22:34,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4607600.0, ans=0.125 2024-08-20 01:22:44,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4607700.0, ans=0.125 2024-08-20 01:22:56,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4607700.0, ans=0.0 2024-08-20 01:23:07,436 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 1400, loss[loss=0.1003, beats_loss=0.01194, ecapa_loss=0.0001164, whisper_loss=0.08715, over 13240.00 frames. ], tot_loss[loss=0.09998, beats_loss=0.01041, ecapa_loss=0.0001371, whisper_loss=0.0882, over 3725891.81 frames. ], batch size: 50, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:23:13,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4607800.0, ans=0.1 2024-08-20 01:23:39,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4607900.0, ans=0.2 2024-08-20 01:23:44,249 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 25 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-20 01:24:16,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4608000.0, ans=0.1 2024-08-20 01:24:33,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4608100.0, ans=0.125 2024-08-20 01:24:35,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4608100.0, ans=0.2 2024-08-20 01:24:37,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4608100.0, ans=0.0 2024-08-20 01:25:06,543 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0315290167927742, model_norm_threshold=48.11598205566406 2024-08-20 01:25:06,706 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.21, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.963e+05, grad_sumsq=4.963e+05, orig_rms_sq=1.000e+00 2024-08-20 01:25:09,015 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 1450, loss[loss=0.09519, beats_loss=0.01003, ecapa_loss=0.000135, whisper_loss=0.0838, over 13787.00 frames. ], tot_loss[loss=0.09954, beats_loss=0.01041, ecapa_loss=0.0001369, whisper_loss=0.08776, over 3728196.95 frames. ], batch size: 52, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:25:13,706 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.75 vs. limit=15.0 2024-08-20 01:25:26,206 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.252e+01 2.461e+01 2.741e+01 1.526e+03, threshold=4.922e+01, percent-clipped=2.0 2024-08-20 01:25:35,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4608400.0, ans=0.0 2024-08-20 01:25:46,186 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 24 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-20 01:25:59,237 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.86 vs. limit=10.0 2024-08-20 01:26:15,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4608500.0, ans=0.125 2024-08-20 01:27:14,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4608700.0, ans=0.125 2024-08-20 01:27:31,049 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 1500, loss[loss=0.1202, beats_loss=0.01074, ecapa_loss=0.0001013, whisper_loss=0.1085, over 21261.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01039, ecapa_loss=0.0001369, whisper_loss=0.0884, over 3733114.85 frames. ], batch size: 77, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:27:46,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4608800.0, ans=0.0 2024-08-20 01:27:47,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4608800.0, ans=0.2 2024-08-20 01:27:51,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4608900.0, ans=0.125 2024-08-20 01:28:29,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4609000.0, ans=0.0 2024-08-20 01:28:47,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4609100.0, ans=0.0 2024-08-20 01:29:13,097 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 1550, loss[loss=0.1102, beats_loss=0.01012, ecapa_loss=0.0001617, whisper_loss=0.09848, over 21545.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01034, ecapa_loss=0.000137, whisper_loss=0.08856, over 3729114.23 frames. ], batch size: 85, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:29:14,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4609300.0, ans=0.2 2024-08-20 01:29:27,018 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.175e+01 2.465e+01 2.675e+01 6.220e+01, threshold=4.930e+01, percent-clipped=1.0 2024-08-20 01:29:41,929 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2024-08-20 01:30:33,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4609700.0, ans=0.1 2024-08-20 01:30:37,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4609700.0, ans=0.5 2024-08-20 01:30:49,740 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 1600, loss[loss=0.1174, beats_loss=0.007423, ecapa_loss=0.0001702, whisper_loss=0.1083, over 16805.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01029, ecapa_loss=0.0001371, whisper_loss=0.08921, over 3696865.72 frames. ], batch size: 70, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:30:49,888 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 26 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-20 01:31:13,165 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 33 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 01:31:17,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4609900.0, ans=0.0 2024-08-20 01:31:37,261 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 28 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-20 01:31:40,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4610000.0, ans=0.125 2024-08-20 01:31:46,059 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 21 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-20 01:31:47,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4610100.0, ans=0.1 2024-08-20 01:32:13,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4610200.0, ans=0.0 2024-08-20 01:32:20,337 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.48 vs. limit=22.5 2024-08-20 01:32:24,535 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 1650, loss[loss=0.1036, beats_loss=0.01035, ecapa_loss=0.0001345, whisper_loss=0.09195, over 18668.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01033, ecapa_loss=0.0001368, whisper_loss=0.08882, over 3719952.92 frames. ], batch size: 71, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:32:39,674 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.232e+01 2.495e+01 2.715e+01 1.384e+02, threshold=4.990e+01, percent-clipped=1.0 2024-08-20 01:32:41,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4610300.0, ans=0.2 2024-08-20 01:32:54,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4610400.0, ans=0.0 2024-08-20 01:33:14,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4610500.0, ans=0.125 2024-08-20 01:33:21,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4610600.0, ans=0.125 2024-08-20 01:33:29,918 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 21 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-20 01:33:52,176 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.97 vs. limit=15.0 2024-08-20 01:33:57,993 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 1700, loss[loss=0.1076, beats_loss=0.008719, ecapa_loss=0.0001523, whisper_loss=0.09735, over 17669.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01023, ecapa_loss=0.0001372, whisper_loss=0.08996, over 3718621.78 frames. ], batch size: 69, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:34:13,133 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.272e+00 2024-08-20 01:34:29,079 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 28 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-20 01:34:37,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4611000.0, ans=0.125 2024-08-20 01:34:39,724 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 23 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-20 01:35:15,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4611200.0, ans=0.2 2024-08-20 01:35:26,068 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 1750, loss[loss=0.1027, beats_loss=0.008575, ecapa_loss=0.0001554, whisper_loss=0.09253, over 13597.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01019, ecapa_loss=0.0001375, whisper_loss=0.09062, over 3706087.62 frames. ], batch size: 53, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:35:28,804 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 27 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-20 01:35:38,034 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.660e+01 2.241e+01 2.449e+01 2.717e+01 4.269e+01, threshold=4.898e+01, percent-clipped=0.0 2024-08-20 01:35:41,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4611300.0, ans=0.0 2024-08-20 01:36:23,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4611600.0, ans=0.125 2024-08-20 01:36:23,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4611600.0, ans=0.1 2024-08-20 01:36:31,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4611600.0, ans=0.0 2024-08-20 01:36:33,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4611600.0, ans=0.125 2024-08-20 01:36:44,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4611700.0, ans=0.1 2024-08-20 01:36:46,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4611700.0, ans=0.0 2024-08-20 01:36:47,755 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 18 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-20 01:36:52,769 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 1800, loss[loss=0.1036, beats_loss=0.0106, ecapa_loss=0.0001512, whisper_loss=0.09147, over 22030.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01024, ecapa_loss=0.0001372, whisper_loss=0.09021, over 3734589.51 frames. ], batch size: 89, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:36:54,611 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 01:37:07,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4611800.0, ans=0.1 2024-08-20 01:37:27,050 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.70 vs. limit=10.0 2024-08-20 01:37:30,540 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.97 vs. limit=22.5 2024-08-20 01:37:37,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4612000.0, ans=0.125 2024-08-20 01:37:44,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4612100.0, ans=0.125 2024-08-20 01:37:49,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4612100.0, ans=0.0 2024-08-20 01:37:54,339 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.71 vs. limit=22.5 2024-08-20 01:37:56,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4612100.0, ans=0.0 2024-08-20 01:38:18,890 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 1850, loss[loss=0.1095, beats_loss=0.01151, ecapa_loss=0.0001457, whisper_loss=0.09658, over 18483.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01021, ecapa_loss=0.0001371, whisper_loss=0.09066, over 3763373.14 frames. ], batch size: 76, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:38:24,958 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 28 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-20 01:38:31,307 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.236e+01 2.438e+01 2.690e+01 3.613e+01, threshold=4.877e+01, percent-clipped=0.0 2024-08-20 01:38:57,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4612500.0, ans=0.0 2024-08-20 01:39:23,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4612600.0, ans=0.125 2024-08-20 01:39:29,871 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 28 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-20 01:39:44,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4612700.0, ans=0.2 2024-08-20 01:39:47,180 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 1900, loss[loss=0.09615, beats_loss=0.01136, ecapa_loss=0.0001539, whisper_loss=0.08325, over 12463.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01027, ecapa_loss=0.0001365, whisper_loss=0.09023, over 3728009.20 frames. ], batch size: 52, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:39:49,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4612800.0, ans=0.1 2024-08-20 01:39:49,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4612800.0, ans=0.0 2024-08-20 01:40:26,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4613000.0, ans=0.0 2024-08-20 01:40:30,855 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 21 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-20 01:40:38,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4613100.0, ans=0.0 2024-08-20 01:40:53,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4613100.0, ans=0.1 2024-08-20 01:41:02,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4613200.0, ans=0.125 2024-08-20 01:41:14,215 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 1950, loss[loss=0.09442, beats_loss=0.01126, ecapa_loss=0.0001195, whisper_loss=0.08197, over 21284.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01033, ecapa_loss=0.0001355, whisper_loss=0.08951, over 3751089.66 frames. ], batch size: 84, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:41:19,777 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 42 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-20 01:41:26,078 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.348e+01 2.572e+01 2.844e+01 4.490e+01, threshold=5.144e+01, percent-clipped=0.0 2024-08-20 01:41:30,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4613400.0, ans=0.125 2024-08-20 01:41:32,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4613400.0, ans=0.0 2024-08-20 01:41:33,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4613400.0, ans=0.125 2024-08-20 01:41:34,841 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 20 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-20 01:42:39,934 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 2000, loss[loss=0.1022, beats_loss=0.009976, ecapa_loss=0.0001745, whisper_loss=0.09045, over 13835.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01038, ecapa_loss=0.0001347, whisper_loss=0.08896, over 3737859.41 frames. ], batch size: 57, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:42:45,202 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 13 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-20 01:42:51,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4613800.0, ans=0.1 2024-08-20 01:42:56,990 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.61 vs. limit=15.0 2024-08-20 01:43:00,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4613900.0, ans=0.2 2024-08-20 01:43:04,429 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0779990628361702, model_norm_threshold=51.44282531738281 2024-08-20 01:43:04,597 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.291e+04, grad_sumsq=4.291e+04, orig_rms_sq=1.000e+00 2024-08-20 01:43:19,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4614000.0, ans=0.1 2024-08-20 01:43:19,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4614000.0, ans=0.0 2024-08-20 01:43:35,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4614100.0, ans=0.1 2024-08-20 01:43:43,290 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 28 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 01:43:48,951 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 27 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-20 01:44:06,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4614300.0, ans=0.2 2024-08-20 01:44:07,480 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 2050, loss[loss=0.09659, beats_loss=0.01163, ecapa_loss=0.000161, whisper_loss=0.08335, over 20611.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01045, ecapa_loss=0.0001362, whisper_loss=0.08932, over 3756652.04 frames. ], batch size: 87, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:44:07,967 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 01:44:19,355 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.219e+01 2.452e+01 2.809e+01 6.595e+02, threshold=4.904e+01, percent-clipped=1.0 2024-08-20 01:44:20,318 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 01:44:32,005 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 28 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-20 01:45:06,439 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 21 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-20 01:45:24,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4614700.0, ans=0.125 2024-08-20 01:45:33,360 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 2100, loss[loss=0.0909, beats_loss=0.01193, ecapa_loss=0.00014, whisper_loss=0.07757, over 20801.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01047, ecapa_loss=0.0001349, whisper_loss=0.0888, over 3759370.30 frames. ], batch size: 88, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:45:37,458 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 19 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-20 01:45:44,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4614800.0, ans=0.125 2024-08-20 01:45:44,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4614800.0, ans=0.125 2024-08-20 01:45:53,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4614900.0, ans=0.0 2024-08-20 01:46:06,098 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 19 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-20 01:46:16,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4615000.0, ans=0.125 2024-08-20 01:46:22,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4615000.0, ans=0.125 2024-08-20 01:46:23,545 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 22 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-20 01:46:27,004 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 26 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-20 01:47:00,098 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 2150, loss[loss=0.09213, beats_loss=0.01032, ecapa_loss=0.0001219, whisper_loss=0.08059, over 17314.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01052, ecapa_loss=0.000134, whisper_loss=0.08829, over 3739368.49 frames. ], batch size: 68, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:47:08,320 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.55 vs. limit=22.5 2024-08-20 01:47:08,376 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-08-20 01:47:12,060 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.717e+01 2.213e+01 2.411e+01 2.746e+01 4.203e+01, threshold=4.821e+01, percent-clipped=0.0 2024-08-20 01:47:22,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4615400.0, ans=0.125 2024-08-20 01:47:29,258 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 17 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-20 01:47:48,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4615500.0, ans=0.1 2024-08-20 01:47:53,129 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 28 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 01:48:25,549 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 2200, loss[loss=0.09081, beats_loss=0.01236, ecapa_loss=0.000114, whisper_loss=0.07731, over 22508.00 frames. ], tot_loss[loss=0.09993, beats_loss=0.01059, ecapa_loss=0.0001343, whisper_loss=0.088, over 3747779.26 frames. ], batch size: 89, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:48:38,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4615800.0, ans=0.125 2024-08-20 01:48:51,128 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 01:48:51,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4615900.0, ans=0.125 2024-08-20 01:48:54,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4615900.0, ans=0.125 2024-08-20 01:49:20,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4616100.0, ans=0.125 2024-08-20 01:49:22,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4616100.0, ans=0.2 2024-08-20 01:49:40,826 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 20 from LS+wenet, 21 from Vox, 14 fro AS 2024-08-20 01:49:50,344 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 2250, loss[loss=0.1083, beats_loss=0.00956, ecapa_loss=0.000141, whisper_loss=0.09731, over 21729.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01062, ecapa_loss=0.0001346, whisper_loss=0.08806, over 3774738.09 frames. ], batch size: 85, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:49:52,178 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 22 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 01:50:02,023 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.187e+01 2.427e+01 2.680e+01 3.409e+01, threshold=4.854e+01, percent-clipped=0.0 2024-08-20 01:50:04,085 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 19 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-20 01:50:15,184 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-20 01:50:34,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4616500.0, ans=0.07 2024-08-20 01:50:46,981 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 20 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-20 01:50:49,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4616600.0, ans=0.125 2024-08-20 01:50:57,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4616700.0, ans=0.125 2024-08-20 01:51:01,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4616700.0, ans=0.2 2024-08-20 01:51:15,689 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 2300, loss[loss=0.07064, beats_loss=0.0133, ecapa_loss=0.0001695, whisper_loss=0.05564, over 20613.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01058, ecapa_loss=0.0001363, whisper_loss=0.08885, over 3766412.99 frames. ], batch size: 95, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:51:30,217 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 33 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-20 01:51:36,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4616900.0, ans=0.125 2024-08-20 01:52:09,947 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.04 vs. limit=12.0 2024-08-20 01:52:14,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4617100.0, ans=0.125 2024-08-20 01:52:43,085 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 2350, loss[loss=0.1114, beats_loss=0.007803, ecapa_loss=0.0001569, whisper_loss=0.102, over 18925.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01054, ecapa_loss=0.0001368, whisper_loss=0.08903, over 3769744.06 frames. ], batch size: 73, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:52:55,224 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.315e+01 2.598e+01 2.990e+01 3.797e+01, threshold=5.197e+01, percent-clipped=0.0 2024-08-20 01:52:57,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4617300.0, ans=0.125 2024-08-20 01:52:59,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4617400.0, ans=0.125 2024-08-20 01:53:17,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4617500.0, ans=0.0 2024-08-20 01:53:29,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4617500.0, ans=0.2 2024-08-20 01:53:29,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4617500.0, ans=0.0 2024-08-20 01:53:48,251 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 01:53:59,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4617700.0, ans=0.125 2024-08-20 01:54:03,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=4617700.0, ans=22.5 2024-08-20 01:54:07,169 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 2400, loss[loss=0.1092, beats_loss=0.007347, ecapa_loss=0.0001651, whisper_loss=0.1002, over 22385.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01047, ecapa_loss=0.0001373, whisper_loss=0.08956, over 3785177.97 frames. ], batch size: 89, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:54:20,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4617800.0, ans=0.0 2024-08-20 01:54:24,769 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 16 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-20 01:54:49,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4618000.0, ans=0.2 2024-08-20 01:54:58,543 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.35 vs. limit=12.0 2024-08-20 01:55:08,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4618100.0, ans=0.0 2024-08-20 01:55:16,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4618200.0, ans=0.125 2024-08-20 01:55:33,042 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 2450, loss[loss=0.0927, beats_loss=0.01229, ecapa_loss=0.0001258, whisper_loss=0.07915, over 22362.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01048, ecapa_loss=0.0001365, whisper_loss=0.08991, over 3797338.39 frames. ], batch size: 90, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:55:37,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4618300.0, ans=0.05 2024-08-20 01:55:45,088 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.716e+01 2.204e+01 2.412e+01 2.711e+01 4.337e+02, threshold=4.825e+01, percent-clipped=1.0 2024-08-20 01:55:59,320 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 16 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 01:56:03,347 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-20 01:56:09,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4618500.0, ans=0.125 2024-08-20 01:56:19,430 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 11 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-20 01:56:22,752 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.03 vs. limit=15.0 2024-08-20 01:56:47,409 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.413e+01 2024-08-20 01:56:55,870 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.17 vs. limit=12.0 2024-08-20 01:56:56,617 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 22 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 01:57:00,187 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 12 from LS+wenet, 8 from Vox, 38 fro AS 2024-08-20 01:57:03,900 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 2500, loss[loss=0.1024, beats_loss=0.01049, ecapa_loss=0.0001122, whisper_loss=0.09074, over 19647.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01051, ecapa_loss=0.0001351, whisper_loss=0.08942, over 3777091.34 frames. ], batch size: 73, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:57:26,928 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.71 vs. limit=22.5 2024-08-20 01:57:58,223 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 32 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-20 01:58:18,655 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-20 01:58:32,183 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 2550, loss[loss=0.09921, beats_loss=0.009597, ecapa_loss=0.0001587, whisper_loss=0.08802, over 21483.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01049, ecapa_loss=0.0001344, whisper_loss=0.08988, over 3816255.06 frames. ], batch size: 90, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:58:40,189 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 22 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-20 01:58:44,473 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.624e+01 2.306e+01 2.523e+01 2.847e+01 3.512e+02, threshold=5.047e+01, percent-clipped=2.0 2024-08-20 01:59:17,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4619500.0, ans=0.125 2024-08-20 01:59:22,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4619500.0, ans=0.1 2024-08-20 01:59:36,526 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.73 vs. limit=6.0 2024-08-20 01:59:38,593 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 01:59:43,500 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.94 vs. limit=15.0 2024-08-20 01:59:48,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4619700.0, ans=0.0 2024-08-20 01:59:53,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4619700.0, ans=0.05 2024-08-20 01:59:58,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4619700.0, ans=0.0 2024-08-20 02:00:01,019 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 2600, loss[loss=0.1072, beats_loss=0.009738, ecapa_loss=0.0001334, whisper_loss=0.09614, over 20896.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0104, ecapa_loss=0.0001365, whisper_loss=0.09014, over 3837023.48 frames. ], batch size: 83, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:00:11,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4619800.0, ans=0.125 2024-08-20 02:00:13,548 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 21 from LS+wenet, 17 from Vox, 15 fro AS 2024-08-20 02:00:29,408 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 22 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 02:00:40,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4620000.0, ans=0.0 2024-08-20 02:00:43,952 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2024-08-20 02:00:55,712 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 29 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-20 02:01:16,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4620200.0, ans=0.0 2024-08-20 02:01:30,019 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 2650, loss[loss=0.08374, beats_loss=0.0138, ecapa_loss=0.0001097, whisper_loss=0.06883, over 23134.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01032, ecapa_loss=0.0001384, whisper_loss=0.09055, over 3829398.57 frames. ], batch size: 93, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:01:42,130 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.89 vs. limit=10.0 2024-08-20 02:01:42,575 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.354e+01 2.571e+01 2.953e+01 6.961e+01, threshold=5.142e+01, percent-clipped=1.0 2024-08-20 02:01:45,742 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.45 vs. limit=22.5 2024-08-20 02:02:16,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4620500.0, ans=0.125 2024-08-20 02:02:18,587 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.33 vs. limit=15.0 2024-08-20 02:02:24,551 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 02:02:31,279 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 30 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-20 02:02:36,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4620600.0, ans=0.125 2024-08-20 02:02:38,450 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.91 vs. limit=15.0 2024-08-20 02:02:39,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4620700.0, ans=0.125 2024-08-20 02:02:51,335 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 17 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 02:02:58,547 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 2700, loss[loss=0.1313, beats_loss=0.009358, ecapa_loss=0.000143, whisper_loss=0.1205, over 14706.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01033, ecapa_loss=0.0001381, whisper_loss=0.08986, over 3824677.38 frames. ], batch size: 56, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:03:10,938 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 19 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 02:03:13,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4620800.0, ans=0.1 2024-08-20 02:03:18,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4620900.0, ans=0.125 2024-08-20 02:03:25,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4620900.0, ans=0.2 2024-08-20 02:03:30,603 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.01 vs. limit=15.0 2024-08-20 02:03:35,600 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 33 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 02:03:52,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4621100.0, ans=0.04949747468305833 2024-08-20 02:04:00,213 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2024-08-20 02:04:08,116 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 18 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-20 02:04:22,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=4621200.0, ans=12.0 2024-08-20 02:04:24,795 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 2750, loss[loss=0.1033, beats_loss=0.007287, ecapa_loss=0.0001469, whisper_loss=0.09458, over 18923.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01034, ecapa_loss=0.0001391, whisper_loss=0.08954, over 3835520.33 frames. ], batch size: 72, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:04:32,035 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 23 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-20 02:04:36,899 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.283e+01 2.512e+01 2.707e+01 3.446e+01, threshold=5.024e+01, percent-clipped=0.0 2024-08-20 02:04:52,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4621400.0, ans=0.125 2024-08-20 02:05:05,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.43 vs. limit=15.0 2024-08-20 02:05:23,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4621600.0, ans=0.125 2024-08-20 02:05:48,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4621700.0, ans=0.0 2024-08-20 02:05:53,034 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 2800, loss[loss=0.1052, beats_loss=0.009644, ecapa_loss=0.0001294, whisper_loss=0.09422, over 18614.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01041, ecapa_loss=0.0001383, whisper_loss=0.08922, over 3825744.83 frames. ], batch size: 74, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:06:08,854 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 02:06:19,061 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 27 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-20 02:06:24,874 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 13 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-20 02:06:33,969 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-20 02:06:52,602 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=22.5 2024-08-20 02:06:53,219 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-20 02:07:01,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4622100.0, ans=0.1 2024-08-20 02:07:06,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4622200.0, ans=0.125 2024-08-20 02:07:09,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4622200.0, ans=0.1 2024-08-20 02:07:22,914 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 2850, loss[loss=0.1115, beats_loss=0.007639, ecapa_loss=0.0001317, whisper_loss=0.1026, over 16726.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.0105, ecapa_loss=0.0001363, whisper_loss=0.08868, over 3812597.38 frames. ], batch size: 62, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:07:24,726 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-20 02:07:25,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4622300.0, ans=0.05 2024-08-20 02:07:27,631 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.54 vs. limit=12.0 2024-08-20 02:07:35,622 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.251e+01 2.480e+01 2.760e+01 4.318e+01, threshold=4.959e+01, percent-clipped=0.0 2024-08-20 02:07:37,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4622300.0, ans=0.0 2024-08-20 02:08:00,309 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 02:08:06,696 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.37 vs. limit=6.0 2024-08-20 02:08:07,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4622500.0, ans=0.0 2024-08-20 02:08:11,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4622500.0, ans=0.0 2024-08-20 02:08:13,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4622500.0, ans=0.2 2024-08-20 02:08:24,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4622600.0, ans=0.2 2024-08-20 02:08:25,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4622600.0, ans=0.125 2024-08-20 02:08:33,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4622700.0, ans=0.0 2024-08-20 02:08:34,895 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 21 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-20 02:08:52,446 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2024-08-20 02:08:52,969 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 2900, loss[loss=0.1081, beats_loss=0.01021, ecapa_loss=0.0001421, whisper_loss=0.09644, over 19994.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0104, ecapa_loss=0.000139, whisper_loss=0.08928, over 3805222.04 frames. ], batch size: 78, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:08:59,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4622800.0, ans=0.2 2024-08-20 02:09:00,110 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 19 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-20 02:09:05,404 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.16 vs. limit=12.0 2024-08-20 02:09:07,434 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.13 vs. limit=8.0 2024-08-20 02:09:09,445 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 18 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-20 02:09:28,859 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 21 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 02:09:36,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4623000.0, ans=0.125 2024-08-20 02:10:04,623 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-20 02:10:06,857 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 33 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-20 02:10:07,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4623200.0, ans=0.125 2024-08-20 02:10:17,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4623200.0, ans=0.1 2024-08-20 02:10:22,388 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 2950, loss[loss=0.1181, beats_loss=0.009879, ecapa_loss=0.0001377, whisper_loss=0.1068, over 22388.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01044, ecapa_loss=0.0001396, whisper_loss=0.08943, over 3802549.71 frames. ], batch size: 88, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:10:24,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4623300.0, ans=0.125 2024-08-20 02:10:34,572 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.285e+01 2.491e+01 2.729e+01 3.693e+01, threshold=4.982e+01, percent-clipped=0.0 2024-08-20 02:10:44,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4623400.0, ans=0.05 2024-08-20 02:10:48,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4623400.0, ans=0.0 2024-08-20 02:10:49,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4623400.0, ans=0.1 2024-08-20 02:11:03,252 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 27 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 02:11:03,478 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 02:11:12,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4623500.0, ans=0.125 2024-08-20 02:11:24,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4623600.0, ans=0.125 2024-08-20 02:11:38,246 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 21 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-20 02:11:48,932 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 3000, loss[loss=0.1181, beats_loss=0.009922, ecapa_loss=0.0001494, whisper_loss=0.1067, over 22486.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01046, ecapa_loss=0.0001399, whisper_loss=0.08971, over 3826782.61 frames. ], batch size: 89, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:11:48,932 INFO [train_multi_KD3.py:1140] (3/4) Computing validation loss 2024-08-20 02:12:25,588 INFO [train_multi_KD3.py:1150] (3/4) Epoch 32, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.000511, whisper_loss=0.249, over 931116.00 frames. 2024-08-20 02:12:46,529 INFO [train_multi_KD3.py:1150] (3/4) Epoch 32, validation on SV_voxceleb1: loss=0.003941, beats_loss=0, ecapa_loss=0.0003941, whisper_loss=0, over 944235.00 frames. 2024-08-20 02:14:20,940 INFO [train_multi_KD3.py:1150] (3/4) Epoch 32, validation on AT_audioset: loss=0.02293, beats_loss=0.02293, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 02:14:20,944 INFO [train_multi_KD3.py:1156] (3/4) Maximum memory allocated so far is 32606MB 2024-08-20 02:14:29,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4623800.0, ans=0.125 2024-08-20 02:14:32,487 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 02:14:52,558 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 14 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 02:15:06,645 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.54 vs. limit=12.0 2024-08-20 02:15:15,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4624100.0, ans=0.0 2024-08-20 02:15:44,446 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 3050, loss[loss=0.09091, beats_loss=0.01265, ecapa_loss=0.0001225, whisper_loss=0.07703, over 18612.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01044, ecapa_loss=0.0001396, whisper_loss=0.09003, over 3843388.47 frames. ], batch size: 73, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:15:56,065 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.348e+01 2.639e+01 2.982e+01 8.249e+01, threshold=5.278e+01, percent-clipped=1.0 2024-08-20 02:15:57,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4624300.0, ans=0.0 2024-08-20 02:16:05,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4624400.0, ans=0.125 2024-08-20 02:16:07,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4624400.0, ans=0.125 2024-08-20 02:16:12,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4624400.0, ans=0.0 2024-08-20 02:16:18,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4624500.0, ans=0.0 2024-08-20 02:16:29,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4624500.0, ans=0.125 2024-08-20 02:16:35,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4624600.0, ans=0.1 2024-08-20 02:16:54,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4624700.0, ans=0.0 2024-08-20 02:17:08,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4624800.0, ans=0.0 2024-08-20 02:17:09,621 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 3100, loss[loss=0.09365, beats_loss=0.009775, ecapa_loss=0.0001542, whisper_loss=0.08233, over 19808.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01041, ecapa_loss=0.0001404, whisper_loss=0.09048, over 3851423.46 frames. ], batch size: 80, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:17:36,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4624900.0, ans=0.125 2024-08-20 02:17:47,652 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 20 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-20 02:17:54,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4625000.0, ans=0.0 2024-08-20 02:18:24,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4625200.0, ans=0.2 2024-08-20 02:18:33,006 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.77 vs. limit=12.0 2024-08-20 02:18:33,644 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 3150, loss[loss=0.0653, beats_loss=0.0132, ecapa_loss=0.0001425, whisper_loss=0.05067, over 21751.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01043, ecapa_loss=0.0001403, whisper_loss=0.09008, over 3812055.55 frames. ], batch size: 96, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:18:36,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4625300.0, ans=0.0 2024-08-20 02:18:39,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4625300.0, ans=0.09899494936611666 2024-08-20 02:18:44,617 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.262e+01 2.448e+01 2.716e+01 4.425e+01, threshold=4.896e+01, percent-clipped=0.0 2024-08-20 02:18:52,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4625400.0, ans=0.1 2024-08-20 02:19:23,454 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 22 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 02:19:24,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4625600.0, ans=0.125 2024-08-20 02:19:27,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4625600.0, ans=0.07 2024-08-20 02:19:31,820 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 36 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-20 02:19:36,529 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.11 vs. limit=15.0 2024-08-20 02:19:38,498 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 21 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-20 02:19:41,874 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 28 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-20 02:19:42,882 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=22.5 2024-08-20 02:19:48,759 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-20 02:19:49,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4625700.0, ans=0.1 2024-08-20 02:19:56,723 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 3200, loss[loss=0.08335, beats_loss=0.01267, ecapa_loss=0.0001092, whisper_loss=0.06959, over 13050.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01052, ecapa_loss=0.0001403, whisper_loss=0.08928, over 3802557.20 frames. ], batch size: 51, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:20:01,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4625800.0, ans=0.1 2024-08-20 02:20:02,293 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 30 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-20 02:20:29,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4626000.0, ans=0.125 2024-08-20 02:21:12,897 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2024-08-20 02:21:19,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=4626300.0, ans=0.02 2024-08-20 02:21:20,042 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 3250, loss[loss=0.1126, beats_loss=0.01129, ecapa_loss=0.0001337, whisper_loss=0.09998, over 22639.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01056, ecapa_loss=0.0001396, whisper_loss=0.08892, over 3803037.67 frames. ], batch size: 90, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:21:26,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4626300.0, ans=0.0 2024-08-20 02:21:31,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4626300.0, ans=0.1 2024-08-20 02:21:32,379 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.285e+01 2.517e+01 2.834e+01 4.980e+01, threshold=5.034e+01, percent-clipped=1.0 2024-08-20 02:21:33,714 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.14 vs. limit=22.5 2024-08-20 02:21:56,160 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 32 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-20 02:21:58,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4626500.0, ans=0.05 2024-08-20 02:22:17,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4626600.0, ans=0.125 2024-08-20 02:22:20,252 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 23 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-20 02:22:46,854 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 3300, loss[loss=0.09084, beats_loss=0.01142, ecapa_loss=9.032e-05, whisper_loss=0.07852, over 13535.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01048, ecapa_loss=0.0001398, whisper_loss=0.0901, over 3814216.83 frames. ], batch size: 50, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:23:00,446 INFO [train_multi_KD3.py:845] (3/4) A total of 96 cuts. 27 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-20 02:23:13,316 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.54 vs. limit=15.0 2024-08-20 02:23:17,320 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 26 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 02:23:17,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4626900.0, ans=0.2 2024-08-20 02:23:37,701 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 21 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-20 02:24:06,195 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 31 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 02:24:06,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4627200.0, ans=0.1 2024-08-20 02:24:08,868 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 3350, loss[loss=0.113, beats_loss=0.01085, ecapa_loss=0.0001228, whisper_loss=0.1009, over 23690.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01053, ecapa_loss=0.0001389, whisper_loss=0.09004, over 3827074.67 frames. ], batch size: 89, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:24:20,710 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.203e+01 2.406e+01 2.784e+01 4.307e+01, threshold=4.813e+01, percent-clipped=0.0 2024-08-20 02:24:23,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4627300.0, ans=0.125 2024-08-20 02:24:27,956 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 30 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-20 02:24:41,532 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 21 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-20 02:24:46,581 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 17 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-20 02:24:57,650 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-20 02:25:01,179 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 02:25:13,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4627600.0, ans=0.125 2024-08-20 02:25:32,824 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 3400, loss[loss=0.09532, beats_loss=0.01234, ecapa_loss=0.0001101, whisper_loss=0.08187, over 17361.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01042, ecapa_loss=0.0001399, whisper_loss=0.08966, over 3785650.17 frames. ], batch size: 69, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:26:35,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4628100.0, ans=0.0 2024-08-20 02:26:55,324 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 3450, loss[loss=0.1077, beats_loss=0.01038, ecapa_loss=0.0001187, whisper_loss=0.0961, over 23917.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01045, ecapa_loss=0.0001401, whisper_loss=0.08911, over 3787228.53 frames. ], batch size: 92, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:26:56,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4628300.0, ans=0.125 2024-08-20 02:27:07,173 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.276e+01 2.600e+01 2.959e+01 4.699e+01, threshold=5.199e+01, percent-clipped=0.0 2024-08-20 02:27:24,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=4628400.0, ans=10.0 2024-08-20 02:27:25,299 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 38 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-20 02:27:52,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4628600.0, ans=0.0 2024-08-20 02:28:11,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4628700.0, ans=0.125 2024-08-20 02:28:19,451 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 3500, loss[loss=0.1206, beats_loss=0.008487, ecapa_loss=0.0001324, whisper_loss=0.1108, over 20162.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01044, ecapa_loss=0.0001402, whisper_loss=0.08956, over 3804318.52 frames. ], batch size: 75, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:28:28,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4628800.0, ans=0.0 2024-08-20 02:28:51,593 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 25 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 02:29:15,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4629100.0, ans=0.5 2024-08-20 02:29:30,265 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 35 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-20 02:29:33,422 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 14 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-20 02:29:44,375 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 3550, loss[loss=0.1068, beats_loss=0.00955, ecapa_loss=0.0001606, whisper_loss=0.09561, over 21343.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01046, ecapa_loss=0.0001412, whisper_loss=0.08935, over 3813849.90 frames. ], batch size: 91, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:29:56,323 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.386e+01 2.605e+01 2.983e+01 3.766e+02, threshold=5.211e+01, percent-clipped=1.0 2024-08-20 02:30:05,244 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 02:30:11,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4629400.0, ans=0.0 2024-08-20 02:30:16,339 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 27 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-20 02:30:23,700 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 23 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-20 02:30:40,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4629600.0, ans=0.125 2024-08-20 02:31:03,734 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 29 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-20 02:31:15,270 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 22 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 02:31:24,052 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 3600, loss[loss=0.09258, beats_loss=0.009442, ecapa_loss=0.0001433, whisper_loss=0.0817, over 14543.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01035, ecapa_loss=0.0001428, whisper_loss=0.08968, over 3771683.17 frames. ], batch size: 59, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:31:25,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4629800.0, ans=0.0 2024-08-20 02:31:39,153 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2024-08-20 02:32:20,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4630000.0, ans=0.1 2024-08-20 02:32:22,634 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.980e-03 2024-08-20 02:32:34,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4630100.0, ans=0.0 2024-08-20 02:33:02,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4630200.0, ans=0.125 2024-08-20 02:33:08,880 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 19 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 02:33:10,557 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.53 vs. limit=15.0 2024-08-20 02:33:15,262 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 3650, loss[loss=0.1021, beats_loss=0.008976, ecapa_loss=0.0001331, whisper_loss=0.09176, over 16338.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0103, ecapa_loss=0.0001424, whisper_loss=0.08999, over 3759419.25 frames. ], batch size: 61, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:33:29,586 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.219e+01 2.439e+01 2.661e+01 4.108e+01, threshold=4.879e+01, percent-clipped=0.0 2024-08-20 02:33:40,864 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 27 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-20 02:34:00,480 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 16 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-20 02:34:12,120 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 29 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-20 02:34:42,299 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 17 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 02:34:52,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4630700.0, ans=0.125 2024-08-20 02:35:06,827 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 3700, loss[loss=0.1063, beats_loss=0.0116, ecapa_loss=0.0001327, whisper_loss=0.09336, over 15246.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01028, ecapa_loss=0.0001415, whisper_loss=0.09014, over 3749592.04 frames. ], batch size: 57, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:35:09,135 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2024-08-20 02:35:18,443 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 18 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-20 02:35:18,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=4630800.0, ans=0.02 2024-08-20 02:35:22,697 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 23 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-20 02:35:26,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4630800.0, ans=0.0 2024-08-20 02:35:35,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4630900.0, ans=0.0 2024-08-20 02:35:50,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4630900.0, ans=0.09899494936611666 2024-08-20 02:35:54,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4631000.0, ans=0.125 2024-08-20 02:35:57,962 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.07 vs. limit=15.0 2024-08-20 02:36:00,631 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-20 02:36:15,120 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 20 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-20 02:36:26,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4631100.0, ans=0.125 2024-08-20 02:36:41,643 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 14 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-20 02:36:53,182 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 20 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-20 02:36:58,533 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 3750, loss[loss=0.08069, beats_loss=0.0118, ecapa_loss=0.0001262, whisper_loss=0.06763, over 13484.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01028, ecapa_loss=0.0001427, whisper_loss=0.09063, over 3755849.53 frames. ], batch size: 55, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:37:07,705 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 22 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-20 02:37:13,338 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.670e+01 2.276e+01 2.480e+01 2.901e+01 4.929e+01, threshold=4.959e+01, percent-clipped=1.0 2024-08-20 02:37:16,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4631300.0, ans=0.0 2024-08-20 02:37:20,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4631400.0, ans=0.125 2024-08-20 02:37:53,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4631500.0, ans=0.125 2024-08-20 02:38:08,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4631600.0, ans=0.125 2024-08-20 02:38:21,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4631600.0, ans=0.125 2024-08-20 02:38:23,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4631700.0, ans=0.0 2024-08-20 02:38:23,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4631700.0, ans=0.125 2024-08-20 02:38:23,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4631700.0, ans=0.125 2024-08-20 02:38:32,740 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.97 vs. limit=15.0 2024-08-20 02:38:37,905 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=18.83 vs. limit=15.0 2024-08-20 02:38:45,178 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 02:38:46,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4631800.0, ans=0.125 2024-08-20 02:38:47,588 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 3800, loss[loss=0.09298, beats_loss=0.01073, ecapa_loss=0.0001326, whisper_loss=0.08092, over 21746.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01036, ecapa_loss=0.0001412, whisper_loss=0.09029, over 3770084.25 frames. ], batch size: 89, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:38:51,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4631800.0, ans=0.125 2024-08-20 02:39:11,583 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 24 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-20 02:39:12,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4631900.0, ans=0.125 2024-08-20 02:40:36,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4632200.0, ans=0.125 2024-08-20 02:40:37,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4632200.0, ans=0.0 2024-08-20 02:40:40,665 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 3850, loss[loss=0.09828, beats_loss=0.0122, ecapa_loss=0.0001436, whisper_loss=0.08464, over 22497.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01036, ecapa_loss=0.0001422, whisper_loss=0.09011, over 3769585.80 frames. ], batch size: 91, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:40:55,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4632300.0, ans=0.0 2024-08-20 02:40:55,856 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.447e+01 2.712e+01 3.131e+01 3.132e+02, threshold=5.425e+01, percent-clipped=6.0 2024-08-20 02:40:56,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4632300.0, ans=0.1 2024-08-20 02:41:06,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4632400.0, ans=0.125 2024-08-20 02:41:12,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4632400.0, ans=0.0 2024-08-20 02:41:21,574 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 28 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-20 02:41:28,593 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.54 vs. limit=12.0 2024-08-20 02:41:42,821 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 21 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-20 02:42:09,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4632700.0, ans=0.04949747468305833 2024-08-20 02:42:16,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4632700.0, ans=0.125 2024-08-20 02:42:26,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4632800.0, ans=0.1 2024-08-20 02:42:26,926 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 3900, loss[loss=0.09604, beats_loss=0.009373, ecapa_loss=0.0001236, whisper_loss=0.08543, over 17251.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01031, ecapa_loss=0.0001434, whisper_loss=0.09042, over 3791994.17 frames. ], batch size: 63, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:42:32,971 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 35 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-20 02:42:39,122 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 28 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-20 02:42:54,153 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 02:42:57,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4632900.0, ans=0.1 2024-08-20 02:43:20,433 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 25 from LS+wenet, 14 from Vox, 17 fro AS 2024-08-20 02:43:26,030 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 02:43:26,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4633000.0, ans=0.0 2024-08-20 02:43:41,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4633100.0, ans=0.0 2024-08-20 02:44:05,402 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-20 02:44:10,495 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.94 vs. limit=15.0 2024-08-20 02:44:17,845 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 3950, loss[loss=0.1259, beats_loss=0.008206, ecapa_loss=0.0001222, whisper_loss=0.1165, over 22055.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01025, ecapa_loss=0.0001439, whisper_loss=0.09087, over 3790759.85 frames. ], batch size: 82, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:44:22,210 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.36 vs. limit=15.0 2024-08-20 02:44:31,155 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 16 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 02:44:33,328 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.316e+01 2.520e+01 2.771e+01 2.265e+02, threshold=5.040e+01, percent-clipped=2.0 2024-08-20 02:44:46,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4633400.0, ans=0.1 2024-08-20 02:45:42,884 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 18 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-20 02:45:46,009 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 39 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-20 02:46:09,268 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 4000, loss[loss=0.1218, beats_loss=0.009116, ecapa_loss=0.000151, whisper_loss=0.1112, over 23001.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0102, ecapa_loss=0.0001445, whisper_loss=0.09168, over 3817477.12 frames. ], batch size: 90, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:46:20,707 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-20 02:46:35,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4633900.0, ans=0.0 2024-08-20 02:46:40,888 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 18 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-20 02:46:56,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4634000.0, ans=0.125 2024-08-20 02:46:58,228 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=15.0 2024-08-20 02:47:07,982 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.15 vs. limit=15.0 2024-08-20 02:47:12,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4634000.0, ans=0.125 2024-08-20 02:47:21,604 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 21 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-20 02:47:26,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4634100.0, ans=0.125 2024-08-20 02:47:30,831 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 27 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-20 02:47:33,167 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 02:47:38,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4634100.0, ans=0.125 2024-08-20 02:47:53,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4634200.0, ans=0.125 2024-08-20 02:48:05,840 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 4050, loss[loss=0.09868, beats_loss=0.009862, ecapa_loss=0.0001269, whisper_loss=0.08755, over 19189.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01024, ecapa_loss=0.0001436, whisper_loss=0.09135, over 3829215.74 frames. ], batch size: 75, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:48:14,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4634300.0, ans=0.95 2024-08-20 02:48:22,558 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.656e+01 2.303e+01 2.496e+01 2.881e+01 4.421e+01, threshold=4.993e+01, percent-clipped=0.0 2024-08-20 02:48:34,970 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 02:48:36,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4634400.0, ans=0.125 2024-08-20 02:48:41,506 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.83 vs. limit=15.0 2024-08-20 02:49:16,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4634500.0, ans=0.0 2024-08-20 02:50:06,622 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 4100, loss[loss=0.06831, beats_loss=0.01017, ecapa_loss=0.0001599, whisper_loss=0.05654, over 15109.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0103, ecapa_loss=0.0001424, whisper_loss=0.0907, over 3808488.95 frames. ], batch size: 61, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:50:09,098 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 23 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-20 02:50:24,481 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.14 vs. limit=15.0 2024-08-20 02:50:25,083 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 23 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 02:50:33,040 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 02:51:03,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4635000.0, ans=0.0 2024-08-20 02:51:13,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4635000.0, ans=0.0 2024-08-20 02:51:36,354 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-20 02:51:54,927 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 19 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-20 02:51:55,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4635200.0, ans=0.1 2024-08-20 02:51:58,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4635200.0, ans=0.0 2024-08-20 02:52:00,488 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.71 vs. limit=22.5 2024-08-20 02:52:01,062 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 4150, loss[loss=0.1113, beats_loss=0.009456, ecapa_loss=0.0001308, whisper_loss=0.1006, over 23483.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0103, ecapa_loss=0.0001422, whisper_loss=0.09066, over 3791227.43 frames. ], batch size: 92, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:52:12,678 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 22 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 02:52:16,094 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.376e+01 2.677e+01 2.991e+01 4.680e+01, threshold=5.353e+01, percent-clipped=0.0 2024-08-20 02:52:20,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4635300.0, ans=0.0 2024-08-20 02:53:13,030 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-20 02:53:26,541 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 20 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-20 02:53:52,214 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 4200, loss[loss=0.132, beats_loss=0.009406, ecapa_loss=0.0001404, whisper_loss=0.1212, over 16946.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01031, ecapa_loss=0.0001424, whisper_loss=0.09083, over 3781015.37 frames. ], batch size: 64, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:54:26,576 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 28 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-20 02:54:41,908 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 18 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 02:55:08,387 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.77 vs. limit=15.0 2024-08-20 02:55:19,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4636100.0, ans=0.1 2024-08-20 02:55:48,363 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 4250, loss[loss=0.08503, beats_loss=0.01263, ecapa_loss=0.0001223, whisper_loss=0.07118, over 14781.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01041, ecapa_loss=0.0001416, whisper_loss=0.09031, over 3786607.81 frames. ], batch size: 56, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:56:00,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4636300.0, ans=0.05 2024-08-20 02:56:06,341 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.193e+01 2.444e+01 2.797e+01 4.359e+01, threshold=4.889e+01, percent-clipped=0.0 2024-08-20 02:56:32,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=4636400.0, ans=6.0 2024-08-20 02:57:03,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4636600.0, ans=0.1 2024-08-20 02:57:23,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4636700.0, ans=0.125 2024-08-20 02:57:23,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4636700.0, ans=0.04949747468305833 2024-08-20 02:57:40,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4636700.0, ans=0.125 2024-08-20 02:57:40,241 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.50 vs. limit=10.0 2024-08-20 02:57:41,065 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 21 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-20 02:57:48,317 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 4300, loss[loss=0.08768, beats_loss=0.01174, ecapa_loss=0.0001144, whisper_loss=0.0748, over 18241.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01039, ecapa_loss=0.0001418, whisper_loss=0.09089, over 3780942.61 frames. ], batch size: 73, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:58:11,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4636900.0, ans=0.125 2024-08-20 02:59:32,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4637200.0, ans=0.125 2024-08-20 02:59:52,024 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 4350, loss[loss=0.07149, beats_loss=0.01076, ecapa_loss=0.0001569, whisper_loss=0.05916, over 16258.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01044, ecapa_loss=0.0001418, whisper_loss=0.09055, over 3796909.16 frames. ], batch size: 69, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:59:56,519 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 34 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-20 03:00:08,994 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.299e+01 2.481e+01 2.858e+01 4.859e+01, threshold=4.961e+01, percent-clipped=0.0 2024-08-20 03:00:11,547 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.98 vs. limit=22.5 2024-08-20 03:00:44,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4637500.0, ans=0.125 2024-08-20 03:01:26,153 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 25 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 03:01:46,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4637700.0, ans=0.125 2024-08-20 03:01:53,493 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 4400, loss[loss=0.104, beats_loss=0.01127, ecapa_loss=0.000142, whisper_loss=0.09133, over 23339.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01038, ecapa_loss=0.0001405, whisper_loss=0.0913, over 3831813.61 frames. ], batch size: 92, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:01:56,510 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-20 03:02:01,009 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.18 vs. limit=15.0 2024-08-20 03:02:11,814 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 10 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-20 03:02:37,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4637900.0, ans=0.0 2024-08-20 03:02:47,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4638000.0, ans=0.1 2024-08-20 03:02:56,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4638000.0, ans=0.125 2024-08-20 03:02:56,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4638000.0, ans=0.0 2024-08-20 03:03:14,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4638100.0, ans=0.025 2024-08-20 03:03:24,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4638100.0, ans=0.125 2024-08-20 03:03:40,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4638200.0, ans=0.2 2024-08-20 03:03:55,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4638300.0, ans=0.125 2024-08-20 03:03:56,302 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 4450, loss[loss=0.09134, beats_loss=0.01141, ecapa_loss=0.000141, whisper_loss=0.07852, over 14550.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01041, ecapa_loss=0.0001406, whisper_loss=0.09044, over 3782896.97 frames. ], batch size: 59, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:04:12,968 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.659e+01 2.158e+01 2.452e+01 2.719e+01 3.768e+01, threshold=4.904e+01, percent-clipped=0.0 2024-08-20 03:04:22,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4638400.0, ans=0.125 2024-08-20 03:04:27,276 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2024-08-20 03:04:28,083 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-20 03:04:41,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4638400.0, ans=0.125 2024-08-20 03:05:01,610 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 03:05:09,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4638600.0, ans=0.2 2024-08-20 03:05:14,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4638600.0, ans=0.1 2024-08-20 03:05:32,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4638600.0, ans=0.125 2024-08-20 03:05:46,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4638700.0, ans=0.125 2024-08-20 03:05:55,450 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 28 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-20 03:05:57,765 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 31 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-20 03:06:00,015 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 4500, loss[loss=0.1095, beats_loss=0.01084, ecapa_loss=0.0001313, whisper_loss=0.09738, over 21212.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01035, ecapa_loss=0.0001418, whisper_loss=0.09089, over 3812438.16 frames. ], batch size: 87, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:06:22,379 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 18 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 03:07:09,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4639000.0, ans=0.125 2024-08-20 03:07:58,938 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 22 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 03:08:05,428 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 4550, loss[loss=0.1124, beats_loss=0.008442, ecapa_loss=0.0001514, whisper_loss=0.1025, over 12989.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01028, ecapa_loss=0.0001421, whisper_loss=0.09104, over 3807007.04 frames. ], batch size: 49, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:08:09,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4639300.0, ans=0.125 2024-08-20 03:08:17,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4639300.0, ans=0.125 2024-08-20 03:08:23,755 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.329e+01 2.605e+01 2.856e+01 5.309e+01, threshold=5.211e+01, percent-clipped=1.0 2024-08-20 03:08:49,887 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 20 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-20 03:09:32,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4639600.0, ans=0.1 2024-08-20 03:10:06,488 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.21 vs. limit=15.0 2024-08-20 03:10:09,876 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=15.0 2024-08-20 03:10:10,759 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 34 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 03:10:13,203 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 4600, loss[loss=0.1212, beats_loss=0.007644, ecapa_loss=0.0001394, whisper_loss=0.1122, over 22221.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01024, ecapa_loss=0.0001414, whisper_loss=0.09096, over 3816597.51 frames. ], batch size: 85, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:10:22,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4639800.0, ans=0.0 2024-08-20 03:10:37,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4639900.0, ans=0.5 2024-08-20 03:10:42,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4639900.0, ans=0.0 2024-08-20 03:10:51,822 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 34 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-20 03:11:24,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4640000.0, ans=0.0 2024-08-20 03:11:57,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4640200.0, ans=0.125 2024-08-20 03:12:02,011 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 27 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 03:12:03,833 INFO [train_multi_KD3.py:845] (3/4) A total of 98 cuts. 28 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-20 03:12:18,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4640200.0, ans=0.1 2024-08-20 03:12:24,831 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 4650, loss[loss=0.09426, beats_loss=0.01051, ecapa_loss=0.0001339, whisper_loss=0.08242, over 18908.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01032, ecapa_loss=0.0001403, whisper_loss=0.09111, over 3818079.75 frames. ], batch size: 73, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:12:32,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4640300.0, ans=0.125 2024-08-20 03:12:41,322 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.331e+01 2.446e+01 2.750e+01 3.848e+01, threshold=4.892e+01, percent-clipped=0.0 2024-08-20 03:12:51,688 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 32 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-20 03:13:16,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4640500.0, ans=0.125 2024-08-20 03:14:01,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4640600.0, ans=0.0 2024-08-20 03:14:11,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4640700.0, ans=0.0 2024-08-20 03:14:24,671 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.18 vs. limit=15.0 2024-08-20 03:14:30,536 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 4700, loss[loss=0.1249, beats_loss=0.008462, ecapa_loss=0.0001238, whisper_loss=0.1152, over 24669.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01039, ecapa_loss=0.0001394, whisper_loss=0.09073, over 3840147.04 frames. ], batch size: 93, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:14:30,743 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 18 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 03:14:57,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4640900.0, ans=0.125 2024-08-20 03:15:00,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4640900.0, ans=0.2 2024-08-20 03:15:10,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4640900.0, ans=0.1 2024-08-20 03:15:18,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4640900.0, ans=0.1 2024-08-20 03:15:23,634 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.72 vs. limit=22.5 2024-08-20 03:15:38,921 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.74 vs. limit=15.0 2024-08-20 03:15:46,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4641100.0, ans=0.0 2024-08-20 03:15:50,251 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 27 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-20 03:16:04,679 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 23 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-20 03:16:21,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4641200.0, ans=0.1 2024-08-20 03:16:23,461 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.78 vs. limit=12.0 2024-08-20 03:16:25,971 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 13 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 03:16:34,889 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 4750, loss[loss=0.1145, beats_loss=0.009663, ecapa_loss=0.0001462, whisper_loss=0.1034, over 22209.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01044, ecapa_loss=0.0001394, whisper_loss=0.0897, over 3827394.14 frames. ], batch size: 89, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:16:36,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4641300.0, ans=0.05 2024-08-20 03:16:44,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4641300.0, ans=0.2 2024-08-20 03:16:47,612 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.08 vs. limit=15.0 2024-08-20 03:16:52,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4641300.0, ans=0.2 2024-08-20 03:16:53,300 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.351e+01 2.626e+01 2.955e+01 4.641e+01, threshold=5.251e+01, percent-clipped=0.0 2024-08-20 03:17:18,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4641400.0, ans=0.0 2024-08-20 03:17:32,408 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 28 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-20 03:17:33,739 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.97 vs. limit=22.5 2024-08-20 03:17:41,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4641500.0, ans=0.125 2024-08-20 03:18:06,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4641600.0, ans=0.1 2024-08-20 03:18:23,270 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 14 from Vox, 45 fro AS 2024-08-20 03:18:40,182 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 03:18:40,917 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 4800, loss[loss=0.08807, beats_loss=0.01363, ecapa_loss=0.0001248, whisper_loss=0.07319, over 17533.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01046, ecapa_loss=0.0001399, whisper_loss=0.08948, over 3816896.22 frames. ], batch size: 72, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:18:42,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4641800.0, ans=0.1 2024-08-20 03:19:00,583 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 03:19:01,627 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 17 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-20 03:19:12,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4641900.0, ans=0.125 2024-08-20 03:20:36,709 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 19 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-20 03:20:46,511 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 4850, loss[loss=0.09596, beats_loss=0.01291, ecapa_loss=0.0001311, whisper_loss=0.08174, over 21931.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01042, ecapa_loss=0.0001398, whisper_loss=0.09003, over 3800297.86 frames. ], batch size: 89, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:21:02,375 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.316e+01 2.589e+01 3.055e+01 7.163e+01, threshold=5.178e+01, percent-clipped=1.0 2024-08-20 03:21:07,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4642300.0, ans=0.125 2024-08-20 03:21:25,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4642400.0, ans=0.125 2024-08-20 03:21:38,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4642500.0, ans=0.0 2024-08-20 03:21:47,634 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 03:22:06,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4642600.0, ans=0.2 2024-08-20 03:22:20,321 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 16 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 03:22:35,264 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 4900, loss[loss=0.1196, beats_loss=0.009356, ecapa_loss=0.0001511, whisper_loss=0.1087, over 19246.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01046, ecapa_loss=0.0001401, whisper_loss=0.08973, over 3837440.17 frames. ], batch size: 76, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:22:47,725 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.10 vs. limit=15.0 2024-08-20 03:22:59,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4642900.0, ans=0.2 2024-08-20 03:23:04,085 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 14 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-20 03:23:18,661 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 14 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-20 03:23:21,188 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 25 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-20 03:23:34,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4643000.0, ans=0.1 2024-08-20 03:23:36,494 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.47 vs. limit=22.5 2024-08-20 03:23:57,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4643200.0, ans=0.125 2024-08-20 03:24:06,335 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.78 vs. limit=15.0 2024-08-20 03:24:20,433 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 4950, loss[loss=0.1046, beats_loss=0.01079, ecapa_loss=0.0001207, whisper_loss=0.0926, over 22375.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01039, ecapa_loss=0.0001406, whisper_loss=0.09019, over 3859105.99 frames. ], batch size: 89, lr: 1.92e-03, grad_scale: 1.152921504606847e+18 2024-08-20 03:24:34,463 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.306e+01 2.561e+01 2.855e+01 3.879e+01, threshold=5.122e+01, percent-clipped=0.0 2024-08-20 03:24:57,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4643400.0, ans=0.2 2024-08-20 03:25:04,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4643500.0, ans=0.0 2024-08-20 03:25:22,307 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.59 vs. limit=15.0 2024-08-20 03:25:35,416 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.94 vs. limit=10.0 2024-08-20 03:25:55,376 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 5000, loss[loss=0.06219, beats_loss=0.0137, ecapa_loss=0.0001027, whisper_loss=0.04747, over 16802.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01039, ecapa_loss=0.0001417, whisper_loss=0.09045, over 3870344.11 frames. ], batch size: 66, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:25:57,417 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 22 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-20 03:26:09,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4643800.0, ans=0.04949747468305833 2024-08-20 03:26:16,367 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 31 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-20 03:26:18,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4643900.0, ans=0.125 2024-08-20 03:26:26,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4643900.0, ans=0.0 2024-08-20 03:26:47,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4644000.0, ans=0.125 2024-08-20 03:26:53,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4644100.0, ans=0.125 2024-08-20 03:27:04,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4644100.0, ans=0.0 2024-08-20 03:27:17,516 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.29 vs. limit=15.0 2024-08-20 03:27:20,154 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 24 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 03:27:27,635 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 5050, loss[loss=0.108, beats_loss=0.009003, ecapa_loss=0.0001354, whisper_loss=0.09761, over 23852.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01043, ecapa_loss=0.0001409, whisper_loss=0.0899, over 3888486.19 frames. ], batch size: 93, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:27:31,191 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 24 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-20 03:27:33,021 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 24 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 03:27:38,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4644300.0, ans=0.125 2024-08-20 03:27:38,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4644300.0, ans=0.125 2024-08-20 03:27:44,295 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.280e+01 2.515e+01 2.844e+01 3.725e+01, threshold=5.031e+01, percent-clipped=0.0 2024-08-20 03:27:45,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4644400.0, ans=0.125 2024-08-20 03:28:06,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4644500.0, ans=0.0 2024-08-20 03:28:12,401 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 14 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-20 03:28:20,316 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 20 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-20 03:28:24,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4644600.0, ans=0.1 2024-08-20 03:28:24,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4644600.0, ans=10.0 2024-08-20 03:28:28,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4644600.0, ans=0.125 2024-08-20 03:28:30,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4644600.0, ans=0.0 2024-08-20 03:28:40,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4644700.0, ans=0.125 2024-08-20 03:28:46,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4644700.0, ans=0.1 2024-08-20 03:28:51,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4644700.0, ans=0.125 2024-08-20 03:28:57,060 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 5100, loss[loss=0.09024, beats_loss=0.01099, ecapa_loss=0.0001361, whisper_loss=0.07789, over 21968.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0104, ecapa_loss=0.0001402, whisper_loss=0.09038, over 3853412.93 frames. ], batch size: 90, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:29:09,417 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.03 vs. limit=15.0 2024-08-20 03:29:20,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4644900.0, ans=0.125 2024-08-20 03:29:24,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=4644900.0, ans=6.0 2024-08-20 03:29:49,392 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.85 vs. limit=6.0 2024-08-20 03:30:27,065 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 5150, loss[loss=0.1194, beats_loss=0.008065, ecapa_loss=0.0001555, whisper_loss=0.1098, over 21057.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01048, ecapa_loss=0.0001397, whisper_loss=0.08967, over 3827651.69 frames. ], batch size: 80, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:30:33,448 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.08 vs. limit=10.0 2024-08-20 03:30:36,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4645300.0, ans=0.125 2024-08-20 03:30:42,420 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.233e+01 2.389e+01 2.694e+01 3.675e+01, threshold=4.778e+01, percent-clipped=0.0 2024-08-20 03:30:47,628 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.30 vs. limit=15.0 2024-08-20 03:30:48,812 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 27 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-20 03:30:50,421 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 13 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-20 03:30:55,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4645400.0, ans=0.0 2024-08-20 03:31:54,536 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 5200, loss[loss=0.1039, beats_loss=0.009822, ecapa_loss=0.0001316, whisper_loss=0.09274, over 15916.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01052, ecapa_loss=0.0001399, whisper_loss=0.0894, over 3813487.04 frames. ], batch size: 62, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:32:15,143 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.01 vs. limit=15.0 2024-08-20 03:32:39,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4646000.0, ans=0.125 2024-08-20 03:32:43,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4646000.0, ans=0.125 2024-08-20 03:32:43,775 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.08 vs. limit=22.5 2024-08-20 03:32:59,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4646100.0, ans=0.2 2024-08-20 03:33:17,418 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 32 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-20 03:33:23,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4646300.0, ans=0.04949747468305833 2024-08-20 03:33:24,353 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 5250, loss[loss=0.1088, beats_loss=0.008258, ecapa_loss=0.0001437, whisper_loss=0.09912, over 19906.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01044, ecapa_loss=0.0001402, whisper_loss=0.09057, over 3840794.83 frames. ], batch size: 81, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:33:40,164 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.321e+01 2.600e+01 2.824e+01 7.148e+01, threshold=5.200e+01, percent-clipped=2.0 2024-08-20 03:33:53,065 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=12.0 2024-08-20 03:33:56,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4646400.0, ans=0.125 2024-08-20 03:34:03,183 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 26 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-20 03:34:14,202 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.28 vs. limit=15.0 2024-08-20 03:34:29,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4646600.0, ans=0.0 2024-08-20 03:34:43,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4646700.0, ans=0.2 2024-08-20 03:34:46,476 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 16 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-20 03:34:55,856 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 5300, loss[loss=0.09394, beats_loss=0.01273, ecapa_loss=0.0001146, whisper_loss=0.08007, over 17657.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01043, ecapa_loss=0.0001409, whisper_loss=0.08998, over 3823963.65 frames. ], batch size: 71, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:35:20,246 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 21 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-20 03:35:28,760 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 20 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-20 03:36:06,229 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-20 03:36:28,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4647200.0, ans=0.2 2024-08-20 03:36:31,108 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.91 vs. limit=12.0 2024-08-20 03:36:36,421 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 5350, loss[loss=0.05795, beats_loss=0.01061, ecapa_loss=0.0001636, whisper_loss=0.04571, over 12092.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01045, ecapa_loss=0.0001411, whisper_loss=0.08917, over 3817301.90 frames. ], batch size: 53, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:36:38,817 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 10 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-20 03:36:55,623 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 03:36:55,840 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.04 vs. limit=15.0 2024-08-20 03:36:57,671 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.184e+01 2.426e+01 2.687e+01 4.168e+01, threshold=4.852e+01, percent-clipped=0.0 2024-08-20 03:37:03,085 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 30 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-20 03:37:12,813 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 8 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-20 03:37:50,069 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 25 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 03:38:35,826 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 5400, loss[loss=0.1102, beats_loss=0.008677, ecapa_loss=0.0001384, whisper_loss=0.1001, over 14049.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01043, ecapa_loss=0.0001399, whisper_loss=0.08888, over 3796400.10 frames. ], batch size: 53, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:38:38,049 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 32 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-20 03:38:39,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4647800.0, ans=0.125 2024-08-20 03:38:58,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4647900.0, ans=0.1 2024-08-20 03:39:26,055 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-20 03:39:32,513 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 24 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 03:40:23,916 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.38 vs. limit=22.5 2024-08-20 03:40:24,003 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=6.32 vs. limit=12.0 2024-08-20 03:40:28,663 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 5450, loss[loss=0.09581, beats_loss=0.01039, ecapa_loss=0.0001742, whisper_loss=0.08367, over 21543.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01043, ecapa_loss=0.0001394, whisper_loss=0.08893, over 3800573.08 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:40:43,190 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=12.0 2024-08-20 03:40:45,558 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.272e+01 2.507e+01 2.790e+01 3.633e+01, threshold=5.013e+01, percent-clipped=0.0 2024-08-20 03:40:55,456 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.10 vs. limit=15.0 2024-08-20 03:41:11,808 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 23 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 03:41:21,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4648500.0, ans=0.0 2024-08-20 03:41:51,475 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 29 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-20 03:42:18,163 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 5500, loss[loss=0.1102, beats_loss=0.01118, ecapa_loss=0.0001688, whisper_loss=0.09735, over 19964.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01031, ecapa_loss=0.00014, whisper_loss=0.08953, over 3817632.89 frames. ], batch size: 82, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:42:22,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4648800.0, ans=0.125 2024-08-20 03:42:32,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4648800.0, ans=0.0 2024-08-20 03:42:50,415 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 27 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-20 03:43:52,510 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.14 vs. limit=15.0 2024-08-20 03:43:56,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4649200.0, ans=0.125 2024-08-20 03:44:05,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4649200.0, ans=0.0 2024-08-20 03:44:11,990 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 5550, loss[loss=0.09355, beats_loss=0.01166, ecapa_loss=0.0001375, whisper_loss=0.08051, over 21587.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01033, ecapa_loss=0.0001405, whisper_loss=0.08978, over 3808054.29 frames. ], batch size: 88, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:44:29,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4649300.0, ans=0.125 2024-08-20 03:44:35,504 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.290e+01 2.579e+01 2.821e+01 2.823e+02, threshold=5.158e+01, percent-clipped=2.0 2024-08-20 03:44:40,217 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 17 from LS+wenet, 23 from Vox, 15 fro AS 2024-08-20 03:45:17,737 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 22 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-20 03:46:11,406 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 5600, loss[loss=0.1119, beats_loss=0.008441, ecapa_loss=0.0001731, whisper_loss=0.1017, over 18612.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01028, ecapa_loss=0.0001415, whisper_loss=0.08954, over 3820790.72 frames. ], batch size: 74, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:47:15,751 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 11 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-20 03:47:58,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4650300.0, ans=0.125 2024-08-20 03:47:59,306 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 5650, loss[loss=0.09656, beats_loss=0.01081, ecapa_loss=0.0001115, whisper_loss=0.08464, over 21165.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01032, ecapa_loss=0.0001421, whisper_loss=0.09004, over 3840958.17 frames. ], batch size: 80, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:48:06,561 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 30 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-20 03:48:10,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4650300.0, ans=0.125 2024-08-20 03:48:18,571 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 10 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-20 03:48:20,049 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.429e+01 2.607e+01 2.937e+01 4.534e+02, threshold=5.214e+01, percent-clipped=3.0 2024-08-20 03:49:11,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4650600.0, ans=10.0 2024-08-20 03:49:34,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4650700.0, ans=0.0 2024-08-20 03:49:54,591 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 5700, loss[loss=0.1004, beats_loss=0.01084, ecapa_loss=0.0001438, whisper_loss=0.08813, over 12859.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01026, ecapa_loss=0.0001426, whisper_loss=0.09016, over 3814225.74 frames. ], batch size: 50, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:50:53,029 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 03:51:12,917 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 19 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-20 03:51:21,800 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.31 vs. limit=6.0 2024-08-20 03:51:41,482 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 5750, loss[loss=0.1176, beats_loss=0.007994, ecapa_loss=0.0001794, whisper_loss=0.1078, over 17107.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01029, ecapa_loss=0.0001429, whisper_loss=0.08961, over 3792507.41 frames. ], batch size: 68, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:51:52,127 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.54 vs. limit=15.0 2024-08-20 03:51:55,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4651300.0, ans=0.1 2024-08-20 03:52:01,526 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.309e+01 2.653e+01 2.956e+01 1.340e+02, threshold=5.306e+01, percent-clipped=1.0 2024-08-20 03:52:20,339 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 19 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-20 03:52:25,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4651500.0, ans=0.125 2024-08-20 03:52:27,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4651500.0, ans=0.2 2024-08-20 03:53:00,699 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.51 vs. limit=22.5 2024-08-20 03:53:06,339 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-20 03:53:17,666 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 28 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-20 03:53:23,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4651700.0, ans=0.0 2024-08-20 03:53:30,707 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 5800, loss[loss=0.1179, beats_loss=0.009091, ecapa_loss=0.0001051, whisper_loss=0.1078, over 16491.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01037, ecapa_loss=0.0001431, whisper_loss=0.08938, over 3848521.07 frames. ], batch size: 58, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:53:35,371 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 32 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-20 03:53:45,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4651800.0, ans=0.2 2024-08-20 03:53:55,774 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 23 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 03:53:57,260 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.58 vs. limit=22.5 2024-08-20 03:53:57,323 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.38 vs. limit=15.0 2024-08-20 03:54:07,553 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2024-08-20 03:54:15,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4652000.0, ans=0.125 2024-08-20 03:54:39,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4652100.0, ans=0.125 2024-08-20 03:54:42,956 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-20 03:55:01,360 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 23 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-20 03:55:15,726 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 5850, loss[loss=0.09715, beats_loss=0.007578, ecapa_loss=0.0001291, whisper_loss=0.08828, over 17385.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01041, ecapa_loss=0.0001428, whisper_loss=0.08926, over 3838617.01 frames. ], batch size: 64, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:55:19,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4652300.0, ans=0.125 2024-08-20 03:55:34,521 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.203e+01 2.512e+01 2.750e+01 3.616e+02, threshold=5.024e+01, percent-clipped=2.0 2024-08-20 03:55:46,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4652400.0, ans=0.07 2024-08-20 03:56:06,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4652500.0, ans=0.125 2024-08-20 03:56:07,268 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 22 from LS+wenet, 30 from Vox, 41 fro AS 2024-08-20 03:56:09,185 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.433e+00 2024-08-20 03:56:28,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4652600.0, ans=0.125 2024-08-20 03:56:32,881 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 28 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-20 03:56:36,479 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 14 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 03:56:52,098 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-20 03:57:05,956 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 5900, loss[loss=0.08579, beats_loss=0.0124, ecapa_loss=0.0001107, whisper_loss=0.07229, over 14570.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0104, ecapa_loss=0.0001429, whisper_loss=0.08924, over 3842075.79 frames. ], batch size: 57, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:57:07,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4652800.0, ans=0.125 2024-08-20 03:57:42,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4652900.0, ans=0.0 2024-08-20 03:57:42,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4652900.0, ans=0.125 2024-08-20 03:57:50,659 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 19 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 03:58:13,804 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-08-20 03:58:22,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4653100.0, ans=0.125 2024-08-20 03:58:38,573 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.88 vs. limit=12.0 2024-08-20 03:58:59,963 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 5950, loss[loss=0.1034, beats_loss=0.01056, ecapa_loss=0.0001147, whisper_loss=0.09173, over 17550.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01042, ecapa_loss=0.000143, whisper_loss=0.08918, over 3817607.50 frames. ], batch size: 68, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:59:21,069 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.326e+01 2.621e+01 2.901e+01 3.816e+01, threshold=5.242e+01, percent-clipped=0.0 2024-08-20 03:59:31,371 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.13 vs. limit=15.0 2024-08-20 03:59:42,135 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-08-20 04:00:08,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4653600.0, ans=0.125 2024-08-20 04:00:12,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4653600.0, ans=0.1 2024-08-20 04:00:12,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4653600.0, ans=0.125 2024-08-20 04:00:28,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4653700.0, ans=0.125 2024-08-20 04:00:46,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4653700.0, ans=0.1 2024-08-20 04:00:49,267 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 6000, loss[loss=0.1028, beats_loss=0.01145, ecapa_loss=0.0001314, whisper_loss=0.09007, over 22190.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01043, ecapa_loss=0.0001422, whisper_loss=0.08878, over 3772053.40 frames. ], batch size: 93, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:00:49,267 INFO [train_multi_KD3.py:1140] (3/4) Computing validation loss 2024-08-20 04:01:25,922 INFO [train_multi_KD3.py:1150] (3/4) Epoch 32, validation on ASR_libri: loss=0.2536, beats_loss=0, ecapa_loss=0.0005122, whisper_loss=0.2485, over 931116.00 frames. 2024-08-20 04:01:50,511 INFO [train_multi_KD3.py:1150] (3/4) Epoch 32, validation on SV_voxceleb1: loss=0.003973, beats_loss=0, ecapa_loss=0.0003973, whisper_loss=0, over 944235.00 frames. 2024-08-20 04:03:25,365 INFO [train_multi_KD3.py:1150] (3/4) Epoch 32, validation on AT_audioset: loss=0.02299, beats_loss=0.02299, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 04:03:25,369 INFO [train_multi_KD3.py:1156] (3/4) Maximum memory allocated so far is 32606MB 2024-08-20 04:03:27,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4653800.0, ans=0.05 2024-08-20 04:03:27,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4653800.0, ans=0.1 2024-08-20 04:03:36,513 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=15.0 2024-08-20 04:04:02,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4654000.0, ans=0.125 2024-08-20 04:04:05,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4654000.0, ans=0.2 2024-08-20 04:04:08,324 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 13 from LS+wenet, 35 from Vox, 39 fro AS 2024-08-20 04:04:44,261 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 04:04:48,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4654200.0, ans=0.125 2024-08-20 04:04:54,631 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 6050, loss[loss=0.1045, beats_loss=0.01093, ecapa_loss=0.0001421, whisper_loss=0.09215, over 21542.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01053, ecapa_loss=0.0001411, whisper_loss=0.08871, over 3814686.75 frames. ], batch size: 86, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:04:54,861 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 17 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-20 04:05:09,302 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.277e+01 2.536e+01 2.822e+01 4.959e+01, threshold=5.072e+01, percent-clipped=0.0 2024-08-20 04:05:10,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4654400.0, ans=0.5 2024-08-20 04:05:11,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4654400.0, ans=0.125 2024-08-20 04:05:58,123 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 04:06:24,006 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 6100, loss[loss=0.09965, beats_loss=0.009465, ecapa_loss=0.0001861, whisper_loss=0.08833, over 20583.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01055, ecapa_loss=0.0001414, whisper_loss=0.08863, over 3799429.96 frames. ], batch size: 88, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:06:31,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4654800.0, ans=0.2 2024-08-20 04:06:46,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4654900.0, ans=0.0 2024-08-20 04:06:53,577 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 28 from LS+wenet, 34 from Vox, 33 fro AS 2024-08-20 04:07:33,682 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 21 from LS+wenet, 12 from Vox, 18 fro AS 2024-08-20 04:07:53,512 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.52 vs. limit=15.0 2024-08-20 04:08:00,080 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 04:08:11,822 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 6150, loss[loss=0.1025, beats_loss=0.009969, ecapa_loss=9.383e-05, whisper_loss=0.09157, over 16384.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0106, ecapa_loss=0.0001404, whisper_loss=0.08897, over 3859936.64 frames. ], batch size: 58, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:08:17,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4655300.0, ans=0.0 2024-08-20 04:08:31,154 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.716e+01 2.289e+01 2.520e+01 2.857e+01 4.942e+02, threshold=5.040e+01, percent-clipped=2.0 2024-08-20 04:08:32,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4655400.0, ans=0.035 2024-08-20 04:08:36,555 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 20 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 04:08:39,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4655400.0, ans=0.125 2024-08-20 04:08:40,503 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 32 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-20 04:08:52,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4655400.0, ans=0.0 2024-08-20 04:08:58,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4655500.0, ans=0.125 2024-08-20 04:09:07,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4655500.0, ans=0.125 2024-08-20 04:09:12,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=4655500.0, ans=0.02 2024-08-20 04:09:33,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4655600.0, ans=0.125 2024-08-20 04:10:01,758 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 6200, loss[loss=0.1042, beats_loss=0.00816, ecapa_loss=0.0001301, whisper_loss=0.09478, over 14009.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0106, ecapa_loss=0.0001399, whisper_loss=0.08921, over 3863605.38 frames. ], batch size: 51, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:10:08,619 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-20 04:10:25,077 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 35 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-20 04:10:38,251 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.61 vs. limit=15.0 2024-08-20 04:10:41,268 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 18 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 04:10:47,747 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 04:11:09,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4656100.0, ans=0.125 2024-08-20 04:11:22,445 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 14 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-20 04:11:42,017 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 25 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-20 04:11:50,493 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 6250, loss[loss=0.1265, beats_loss=0.007891, ecapa_loss=0.0001578, whisper_loss=0.1171, over 22540.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01063, ecapa_loss=0.0001395, whisper_loss=0.08886, over 3862380.77 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:11:51,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4656300.0, ans=0.125 2024-08-20 04:12:09,516 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.244e+01 2.486e+01 2.895e+01 5.036e+01, threshold=4.971e+01, percent-clipped=0.0 2024-08-20 04:12:13,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4656400.0, ans=0.0 2024-08-20 04:12:16,147 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.890e+01 2024-08-20 04:12:23,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4656400.0, ans=0.125 2024-08-20 04:12:29,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4656400.0, ans=0.2 2024-08-20 04:12:34,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=4656500.0, ans=0.2 2024-08-20 04:12:54,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4656500.0, ans=0.125 2024-08-20 04:13:13,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4656600.0, ans=0.0 2024-08-20 04:13:26,792 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 22 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 04:13:41,018 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 6300, loss[loss=0.09209, beats_loss=0.01248, ecapa_loss=0.0001454, whisper_loss=0.07815, over 21902.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01061, ecapa_loss=0.0001408, whisper_loss=0.08927, over 3887817.97 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:14:20,993 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-20 04:14:30,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4657000.0, ans=0.2 2024-08-20 04:14:39,580 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-20 04:15:12,345 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.66 vs. limit=10.0 2024-08-20 04:15:23,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4657200.0, ans=0.1 2024-08-20 04:15:33,937 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 26 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 04:15:36,374 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 6350, loss[loss=0.1032, beats_loss=0.01139, ecapa_loss=0.0001367, whisper_loss=0.09047, over 20733.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01054, ecapa_loss=0.0001406, whisper_loss=0.08967, over 3877172.80 frames. ], batch size: 84, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:15:38,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4657300.0, ans=0.125 2024-08-20 04:15:52,154 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 04:15:56,332 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.231e+01 2.542e+01 2.829e+01 6.825e+01, threshold=5.084e+01, percent-clipped=1.0 2024-08-20 04:16:18,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4657400.0, ans=0.1 2024-08-20 04:16:23,027 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 04:16:29,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4657500.0, ans=0.125 2024-08-20 04:16:34,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4657500.0, ans=0.2 2024-08-20 04:16:51,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4657600.0, ans=0.125 2024-08-20 04:17:01,184 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.776e+00 2024-08-20 04:17:10,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4657700.0, ans=0.125 2024-08-20 04:17:21,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4657700.0, ans=0.2 2024-08-20 04:17:25,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4657800.0, ans=0.125 2024-08-20 04:17:26,692 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 6400, loss[loss=0.1167, beats_loss=0.01077, ecapa_loss=0.0001372, whisper_loss=0.1045, over 18155.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01043, ecapa_loss=0.0001416, whisper_loss=0.09, over 3849040.79 frames. ], batch size: 72, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:17:37,696 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.87 vs. limit=15.0 2024-08-20 04:17:43,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4657800.0, ans=0.0 2024-08-20 04:18:06,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4657900.0, ans=0.125 2024-08-20 04:18:24,624 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 04:18:35,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4658100.0, ans=0.0 2024-08-20 04:18:35,676 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2024-08-20 04:18:40,831 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 18 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-20 04:18:49,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4658100.0, ans=0.0 2024-08-20 04:18:56,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4658200.0, ans=0.07 2024-08-20 04:19:03,397 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 18 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-20 04:19:18,229 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 6450, loss[loss=0.109, beats_loss=0.01072, ecapa_loss=0.0001311, whisper_loss=0.09699, over 15666.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01039, ecapa_loss=0.000142, whisper_loss=0.09047, over 3845021.71 frames. ], batch size: 62, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:19:37,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4658300.0, ans=0.125 2024-08-20 04:19:38,703 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.209e+01 2.444e+01 2.735e+01 9.511e+01, threshold=4.888e+01, percent-clipped=1.0 2024-08-20 04:19:53,903 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 35 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 04:19:54,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4658400.0, ans=0.035 2024-08-20 04:20:21,676 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 28 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-20 04:20:26,227 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 21 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-20 04:20:50,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4658700.0, ans=0.0 2024-08-20 04:20:58,009 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.54 vs. limit=15.0 2024-08-20 04:21:08,664 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 04:21:11,438 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 6500, loss[loss=0.08499, beats_loss=0.01045, ecapa_loss=0.0001541, whisper_loss=0.073, over 15914.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01038, ecapa_loss=0.000142, whisper_loss=0.09086, over 3809787.64 frames. ], batch size: 66, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:21:16,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4658800.0, ans=0.2 2024-08-20 04:21:31,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4658900.0, ans=0.1 2024-08-20 04:21:34,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4658900.0, ans=0.125 2024-08-20 04:21:57,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4659000.0, ans=0.125 2024-08-20 04:22:06,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4659000.0, ans=0.0 2024-08-20 04:22:33,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4659100.0, ans=0.2 2024-08-20 04:22:57,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=4659200.0, ans=0.1 2024-08-20 04:23:02,361 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 6550, loss[loss=0.09666, beats_loss=0.007905, ecapa_loss=0.0001503, whisper_loss=0.08725, over 17509.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01024, ecapa_loss=0.0001431, whisper_loss=0.09177, over 3856240.25 frames. ], batch size: 66, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:23:11,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4659300.0, ans=0.0 2024-08-20 04:23:17,469 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 18 from LS+wenet, 26 from Vox, 15 fro AS 2024-08-20 04:23:23,951 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.308e+01 2.565e+01 2.877e+01 4.180e+01, threshold=5.130e+01, percent-clipped=0.0 2024-08-20 04:24:29,522 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-20 04:24:55,665 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 17 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-20 04:24:57,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4659700.0, ans=0.125 2024-08-20 04:25:01,162 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 6600, loss[loss=0.1042, beats_loss=0.01001, ecapa_loss=0.0001724, whisper_loss=0.0925, over 18531.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01021, ecapa_loss=0.0001442, whisper_loss=0.0916, over 3836925.43 frames. ], batch size: 78, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:25:03,505 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 29 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-20 04:25:26,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4659900.0, ans=0.125 2024-08-20 04:25:30,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4659900.0, ans=0.125 2024-08-20 04:25:38,419 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 35 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-20 04:25:42,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4659900.0, ans=0.1 2024-08-20 04:25:44,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4660000.0, ans=0.125 2024-08-20 04:26:08,064 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 25 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-20 04:26:41,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4660200.0, ans=10.0 2024-08-20 04:26:52,842 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 6650, loss[loss=0.1152, beats_loss=0.009241, ecapa_loss=0.0001531, whisper_loss=0.1045, over 22967.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01027, ecapa_loss=0.0001435, whisper_loss=0.0917, over 3846918.80 frames. ], batch size: 94, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:26:56,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=4660300.0, ans=15.0 2024-08-20 04:27:03,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4660300.0, ans=0.0 2024-08-20 04:27:14,226 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.439e+01 2.716e+01 3.206e+01 5.057e+01, threshold=5.432e+01, percent-clipped=0.0 2024-08-20 04:28:33,318 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 19 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-20 04:28:37,899 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 28 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-20 04:28:51,726 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.08 vs. limit=10.0 2024-08-20 04:28:52,039 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 6700, loss[loss=0.08505, beats_loss=0.01172, ecapa_loss=0.0001289, whisper_loss=0.07203, over 17646.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01031, ecapa_loss=0.0001442, whisper_loss=0.09129, over 3868700.68 frames. ], batch size: 71, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:29:41,509 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 26 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-20 04:29:43,378 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.18 vs. limit=15.0 2024-08-20 04:29:51,120 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 23 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-20 04:29:58,535 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-20 04:30:37,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4661200.0, ans=0.0 2024-08-20 04:30:41,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4661200.0, ans=0.2 2024-08-20 04:30:48,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4661300.0, ans=0.125 2024-08-20 04:30:49,568 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 6750, loss[loss=0.1115, beats_loss=0.009313, ecapa_loss=0.0001362, whisper_loss=0.1008, over 22351.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01028, ecapa_loss=0.000144, whisper_loss=0.09192, over 3921392.53 frames. ], batch size: 91, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:30:50,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4661300.0, ans=0.125 2024-08-20 04:30:59,735 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.39 vs. limit=6.0 2024-08-20 04:31:03,926 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.65 vs. limit=22.5 2024-08-20 04:31:08,749 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.685e+01 2.262e+01 2.503e+01 2.805e+01 3.998e+01, threshold=5.006e+01, percent-clipped=0.0 2024-08-20 04:31:29,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4661400.0, ans=0.0 2024-08-20 04:31:29,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4661400.0, ans=0.1 2024-08-20 04:31:36,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4661500.0, ans=0.0 2024-08-20 04:31:40,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4661500.0, ans=0.125 2024-08-20 04:31:44,924 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2024-08-20 04:31:45,739 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 27 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 04:31:51,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4661500.0, ans=0.07 2024-08-20 04:31:54,757 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 21 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 04:32:41,871 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 6800, loss[loss=0.1194, beats_loss=0.009038, ecapa_loss=0.0001224, whisper_loss=0.1092, over 20425.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01036, ecapa_loss=0.000143, whisper_loss=0.09188, over 3919475.39 frames. ], batch size: 75, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:33:10,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4661900.0, ans=0.1 2024-08-20 04:33:22,838 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 21 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 04:33:57,462 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 04:34:21,188 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.42 vs. limit=15.0 2024-08-20 04:34:35,307 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 6850, loss[loss=0.1122, beats_loss=0.009657, ecapa_loss=0.0001559, whisper_loss=0.101, over 20049.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.0001427, whisper_loss=0.0906, over 3899695.60 frames. ], batch size: 82, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:34:51,741 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 04:34:55,751 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.273e+01 2.508e+01 2.881e+01 4.383e+01, threshold=5.015e+01, percent-clipped=0.0 2024-08-20 04:35:09,107 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 19 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-20 04:35:26,875 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 26 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 04:35:29,215 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 20 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-20 04:36:09,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4662700.0, ans=0.125 2024-08-20 04:36:20,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4662700.0, ans=0.125 2024-08-20 04:36:23,906 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 25 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-20 04:36:27,892 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 6900, loss[loss=0.1064, beats_loss=0.01093, ecapa_loss=0.0001791, whisper_loss=0.09363, over 22778.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01043, ecapa_loss=0.0001427, whisper_loss=0.09009, over 3894439.48 frames. ], batch size: 93, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:36:28,081 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-20 04:36:40,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4662800.0, ans=0.0 2024-08-20 04:36:42,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4662800.0, ans=0.0 2024-08-20 04:36:52,017 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 20 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-20 04:37:03,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4662900.0, ans=0.0 2024-08-20 04:37:05,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4662900.0, ans=0.125 2024-08-20 04:37:31,436 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.12 vs. limit=15.0 2024-08-20 04:38:03,407 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 20 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-20 04:38:14,716 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 6950, loss[loss=0.09313, beats_loss=0.01153, ecapa_loss=0.0001459, whisper_loss=0.08014, over 18801.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01045, ecapa_loss=0.0001419, whisper_loss=0.08983, over 3898242.01 frames. ], batch size: 77, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:38:27,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4663300.0, ans=0.125 2024-08-20 04:38:31,502 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 23 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-20 04:38:35,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4663300.0, ans=0.125 2024-08-20 04:38:35,735 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.402e+01 2.667e+01 2.923e+01 3.663e+02, threshold=5.334e+01, percent-clipped=2.0 2024-08-20 04:38:52,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4663400.0, ans=0.2 2024-08-20 04:39:37,253 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 25 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-20 04:39:55,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4663700.0, ans=0.125 2024-08-20 04:39:58,163 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 7000, loss[loss=0.09746, beats_loss=0.009877, ecapa_loss=0.0001435, whisper_loss=0.08615, over 21370.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01048, ecapa_loss=0.0001412, whisper_loss=0.08926, over 3863188.52 frames. ], batch size: 83, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:40:04,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4663800.0, ans=0.2 2024-08-20 04:40:27,682 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 29 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-20 04:40:54,805 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 18 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-20 04:40:57,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4664100.0, ans=0.0 2024-08-20 04:41:08,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4664100.0, ans=0.125 2024-08-20 04:41:22,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4664200.0, ans=0.125 2024-08-20 04:41:28,058 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 39 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-20 04:41:31,591 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 7050, loss[loss=0.09379, beats_loss=0.009593, ecapa_loss=0.000177, whisper_loss=0.08243, over 19105.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01041, ecapa_loss=0.0001411, whisper_loss=0.09026, over 3858474.74 frames. ], batch size: 81, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:41:47,559 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.321e+01 2.580e+01 2.916e+01 2.806e+02, threshold=5.159e+01, percent-clipped=2.0 2024-08-20 04:41:58,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4664400.0, ans=0.1 2024-08-20 04:42:02,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4664400.0, ans=0.125 2024-08-20 04:42:13,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4664500.0, ans=0.0 2024-08-20 04:42:41,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4664600.0, ans=0.125 2024-08-20 04:42:54,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=4664700.0, ans=10.0 2024-08-20 04:43:05,571 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 7100, loss[loss=0.1019, beats_loss=0.009229, ecapa_loss=0.0001259, whisper_loss=0.09138, over 13353.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.0001409, whisper_loss=0.09005, over 3851761.73 frames. ], batch size: 49, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:43:08,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4664800.0, ans=0.125 2024-08-20 04:43:19,749 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.92 vs. limit=22.5 2024-08-20 04:43:22,416 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2024-08-20 04:43:31,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4664900.0, ans=0.0 2024-08-20 04:43:34,477 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 19 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-20 04:43:35,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4664900.0, ans=0.1 2024-08-20 04:43:38,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4664900.0, ans=0.125 2024-08-20 04:43:41,188 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 17 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-20 04:44:12,351 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 18 from LS+wenet, 10 from Vox, 39 fro AS 2024-08-20 04:44:35,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4665200.0, ans=0.125 2024-08-20 04:44:42,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4665200.0, ans=0.125 2024-08-20 04:44:57,412 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 7150, loss[loss=0.1026, beats_loss=0.008092, ecapa_loss=0.0001324, whisper_loss=0.09316, over 18898.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01054, ecapa_loss=0.0001401, whisper_loss=0.0893, over 3836883.28 frames. ], batch size: 71, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:45:12,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4665300.0, ans=0.2 2024-08-20 04:45:16,937 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.63 vs. limit=22.5 2024-08-20 04:45:17,372 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.230e+01 2.408e+01 2.713e+01 4.387e+01, threshold=4.817e+01, percent-clipped=0.0 2024-08-20 04:45:31,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4665400.0, ans=0.2 2024-08-20 04:46:27,857 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 25 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-20 04:46:33,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4665700.0, ans=0.0 2024-08-20 04:46:45,592 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 30 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-20 04:46:52,137 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 7200, loss[loss=0.09097, beats_loss=0.0119, ecapa_loss=0.0001093, whisper_loss=0.07798, over 17534.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01053, ecapa_loss=0.0001401, whisper_loss=0.0893, over 3816996.70 frames. ], batch size: 69, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:47:16,427 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 38 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-20 04:47:21,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4665900.0, ans=0.125 2024-08-20 04:47:31,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4665900.0, ans=0.125 2024-08-20 04:47:42,266 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=15.0 2024-08-20 04:47:48,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4666000.0, ans=0.0 2024-08-20 04:47:52,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4666000.0, ans=0.125 2024-08-20 04:48:00,899 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 33 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-20 04:48:03,369 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 22 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-20 04:48:44,277 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 7250, loss[loss=0.08996, beats_loss=0.009954, ecapa_loss=0.0001329, whisper_loss=0.07868, over 22344.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01043, ecapa_loss=0.0001395, whisper_loss=0.08986, over 3830179.90 frames. ], batch size: 91, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:49:04,004 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.278e+01 2.449e+01 2.713e+01 3.965e+01, threshold=4.897e+01, percent-clipped=0.0 2024-08-20 04:49:09,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4666400.0, ans=0.125 2024-08-20 04:49:10,304 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 19 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 04:49:46,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4666500.0, ans=0.025 2024-08-20 04:49:52,031 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-20 04:49:53,757 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 17 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 04:50:33,856 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 7300, loss[loss=0.1171, beats_loss=0.009158, ecapa_loss=0.0001528, whisper_loss=0.1064, over 23029.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0105, ecapa_loss=0.0001399, whisper_loss=0.08917, over 3815784.12 frames. ], batch size: 91, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:50:41,411 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 20 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-20 04:50:56,859 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 20 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-20 04:50:58,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4666900.0, ans=0.0 2024-08-20 04:51:20,277 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.67 vs. limit=15.0 2024-08-20 04:51:44,534 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 31 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-20 04:51:46,689 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 04:51:52,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4667100.0, ans=0.0 2024-08-20 04:52:20,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4667200.0, ans=0.0 2024-08-20 04:52:29,464 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 7350, loss[loss=0.1096, beats_loss=0.008118, ecapa_loss=0.0001448, whisper_loss=0.1, over 17193.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01037, ecapa_loss=0.0001418, whisper_loss=0.08949, over 3796815.49 frames. ], batch size: 70, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:52:31,138 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-20 04:52:37,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4667300.0, ans=0.05 2024-08-20 04:52:42,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4667300.0, ans=0.0 2024-08-20 04:52:50,461 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.276e+01 2.449e+01 2.717e+01 4.858e+01, threshold=4.897e+01, percent-clipped=0.0 2024-08-20 04:53:02,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4667400.0, ans=0.125 2024-08-20 04:53:06,124 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 26 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-20 04:53:28,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4667500.0, ans=0.125 2024-08-20 04:53:40,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4667600.0, ans=0.0 2024-08-20 04:53:42,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4667600.0, ans=0.0 2024-08-20 04:53:50,685 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 16 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-20 04:53:52,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=4667600.0, ans=0.05 2024-08-20 04:54:20,631 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 7400, loss[loss=0.09873, beats_loss=0.008998, ecapa_loss=0.0001133, whisper_loss=0.0886, over 17009.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01038, ecapa_loss=0.0001393, whisper_loss=0.09018, over 3799784.98 frames. ], batch size: 63, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:54:27,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4667800.0, ans=0.0 2024-08-20 04:54:43,865 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2024-08-20 04:54:45,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4667900.0, ans=0.125 2024-08-20 04:55:02,008 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-20 04:55:29,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4668100.0, ans=0.05 2024-08-20 04:55:31,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4668100.0, ans=0.125 2024-08-20 04:55:40,503 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 22 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 04:55:44,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4668100.0, ans=0.125 2024-08-20 04:55:47,649 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 04:55:51,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4668100.0, ans=0.1 2024-08-20 04:56:13,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4668200.0, ans=0.125 2024-08-20 04:56:13,106 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 04:56:13,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4668200.0, ans=0.1 2024-08-20 04:56:18,308 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 7450, loss[loss=0.0768, beats_loss=0.009802, ecapa_loss=0.0001325, whisper_loss=0.06567, over 17153.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01037, ecapa_loss=0.0001395, whisper_loss=0.08948, over 3751574.28 frames. ], batch size: 66, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:56:29,240 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.89 vs. limit=15.0 2024-08-20 04:56:39,313 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.202e+01 2.465e+01 2.731e+01 3.799e+01, threshold=4.929e+01, percent-clipped=0.0 2024-08-20 04:57:15,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=4668500.0, ans=10.0 2024-08-20 04:57:51,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4668700.0, ans=0.125 2024-08-20 04:57:56,674 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 21 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-20 04:57:58,063 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.17 vs. limit=15.0 2024-08-20 04:57:59,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4668700.0, ans=0.1 2024-08-20 04:58:12,398 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 7500, loss[loss=0.124, beats_loss=0.009625, ecapa_loss=0.0001521, whisper_loss=0.1129, over 23238.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01035, ecapa_loss=0.00014, whisper_loss=0.09035, over 3774534.75 frames. ], batch size: 93, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:58:18,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4668800.0, ans=0.0 2024-08-20 04:58:24,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4668800.0, ans=0.125 2024-08-20 04:59:00,782 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 38 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 04:59:14,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4669000.0, ans=0.1 2024-08-20 04:59:14,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4669000.0, ans=0.125 2024-08-20 04:59:19,291 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 28 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-20 04:59:40,466 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.86 vs. limit=15.0 2024-08-20 04:59:42,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4669200.0, ans=0.125 2024-08-20 04:59:51,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4669200.0, ans=0.125 2024-08-20 04:59:59,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4669200.0, ans=0.0 2024-08-20 05:00:05,038 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 7550, loss[loss=0.06963, beats_loss=0.01601, ecapa_loss=0.0001081, whisper_loss=0.05253, over 14728.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01045, ecapa_loss=0.0001398, whisper_loss=0.08963, over 3779313.77 frames. ], batch size: 62, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:00:07,261 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.22 vs. limit=12.0 2024-08-20 05:00:12,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4669300.0, ans=0.2 2024-08-20 05:00:20,423 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.90 vs. limit=22.5 2024-08-20 05:00:22,605 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.706e+01 2.290e+01 2.609e+01 3.060e+01 2.674e+02, threshold=5.218e+01, percent-clipped=1.0 2024-08-20 05:00:50,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4669500.0, ans=0.2 2024-08-20 05:00:51,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4669500.0, ans=0.0 2024-08-20 05:01:07,789 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 32 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-20 05:01:14,448 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2024-08-20 05:01:52,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4669700.0, ans=0.1 2024-08-20 05:01:52,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4669700.0, ans=0.125 2024-08-20 05:01:55,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4669700.0, ans=0.125 2024-08-20 05:01:57,856 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 7600, loss[loss=0.08525, beats_loss=0.01194, ecapa_loss=0.0001158, whisper_loss=0.07215, over 13517.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01039, ecapa_loss=0.0001416, whisper_loss=0.08935, over 3801876.39 frames. ], batch size: 53, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:02:06,151 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.02 vs. limit=12.0 2024-08-20 05:02:08,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4669800.0, ans=0.125 2024-08-20 05:02:22,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4669900.0, ans=0.5 2024-08-20 05:03:02,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4670000.0, ans=0.0 2024-08-20 05:03:10,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4670100.0, ans=0.0 2024-08-20 05:03:15,587 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.43 vs. limit=15.0 2024-08-20 05:03:36,019 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 25 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-20 05:03:39,144 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.51 vs. limit=12.0 2024-08-20 05:03:48,114 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 7650, loss[loss=0.0857, beats_loss=0.009225, ecapa_loss=0.0001629, whisper_loss=0.07485, over 18203.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01034, ecapa_loss=0.0001429, whisper_loss=0.08959, over 3799660.63 frames. ], batch size: 78, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:04:08,388 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.986e+01 2.328e+01 2.537e+01 2.838e+01 5.582e+01, threshold=5.074e+01, percent-clipped=1.0 2024-08-20 05:04:16,230 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.31 vs. limit=5.0 2024-08-20 05:04:17,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4670400.0, ans=0.125 2024-08-20 05:04:46,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4670500.0, ans=0.05 2024-08-20 05:05:21,014 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 20 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-20 05:05:32,579 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 7700, loss[loss=0.09642, beats_loss=0.009329, ecapa_loss=0.0001467, whisper_loss=0.08562, over 18803.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01039, ecapa_loss=0.000142, whisper_loss=0.08866, over 3806619.76 frames. ], batch size: 76, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:05:50,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4670800.0, ans=0.125 2024-08-20 05:05:56,195 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 18 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 05:05:58,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4670900.0, ans=0.1 2024-08-20 05:06:03,510 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 23 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 05:06:52,470 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.76 vs. limit=15.0 2024-08-20 05:06:55,465 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.87 vs. limit=22.5 2024-08-20 05:07:00,879 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 32 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-20 05:07:10,474 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 22 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-20 05:07:28,237 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 7750, loss[loss=0.1196, beats_loss=0.01055, ecapa_loss=0.0001291, whisper_loss=0.1077, over 23594.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01035, ecapa_loss=0.0001412, whisper_loss=0.08919, over 3803975.63 frames. ], batch size: 89, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:07:49,779 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.264e+01 2.430e+01 2.732e+01 4.233e+01, threshold=4.861e+01, percent-clipped=0.0 2024-08-20 05:09:00,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4671600.0, ans=0.0 2024-08-20 05:09:06,406 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 05:09:11,567 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 29 from LS+wenet, 31 from Vox, 28 fro AS 2024-08-20 05:09:31,022 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 7800, loss[loss=0.1042, beats_loss=0.01173, ecapa_loss=0.0001076, whisper_loss=0.09144, over 16256.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01041, ecapa_loss=0.0001401, whisper_loss=0.089, over 3834286.78 frames. ], batch size: 62, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:09:32,657 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 24 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-20 05:09:34,638 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-20 05:09:41,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4671800.0, ans=0.0 2024-08-20 05:09:45,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4671800.0, ans=0.0 2024-08-20 05:09:48,342 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.82 vs. limit=12.0 2024-08-20 05:10:22,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4672000.0, ans=0.1 2024-08-20 05:11:26,340 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 7850, loss[loss=0.1042, beats_loss=0.008596, ecapa_loss=0.0001519, whisper_loss=0.09407, over 15219.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01037, ecapa_loss=0.0001406, whisper_loss=0.08977, over 3840492.46 frames. ], batch size: 56, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:11:38,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4672300.0, ans=0.05 2024-08-20 05:11:39,059 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 28 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-20 05:11:46,583 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.268e+01 2.521e+01 2.830e+01 3.600e+01, threshold=5.042e+01, percent-clipped=0.0 2024-08-20 05:11:54,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4672400.0, ans=0.0 2024-08-20 05:12:00,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4672400.0, ans=0.2 2024-08-20 05:12:05,787 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 25 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-20 05:12:25,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4672500.0, ans=0.2 2024-08-20 05:12:28,599 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 30 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-20 05:12:42,419 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 21 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-20 05:13:14,394 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.05 vs. limit=22.5 2024-08-20 05:13:14,791 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 7900, loss[loss=0.1049, beats_loss=0.007425, ecapa_loss=0.0001667, whisper_loss=0.09584, over 12968.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01038, ecapa_loss=0.0001403, whisper_loss=0.08956, over 3837700.35 frames. ], batch size: 51, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:13:20,229 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 13 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-20 05:13:41,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=4672900.0, ans=0.05 2024-08-20 05:13:58,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4673000.0, ans=0.125 2024-08-20 05:14:23,473 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 19 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-20 05:14:30,266 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 27 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-20 05:14:53,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4673200.0, ans=0.04949747468305833 2024-08-20 05:15:08,437 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.03 vs. limit=15.0 2024-08-20 05:15:08,738 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 7950, loss[loss=0.1211, beats_loss=0.008466, ecapa_loss=0.0001959, whisper_loss=0.1107, over 20621.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01038, ecapa_loss=0.0001405, whisper_loss=0.08921, over 3819251.06 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:15:28,183 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.672e+01 2.282e+01 2.544e+01 2.823e+01 6.203e+01, threshold=5.088e+01, percent-clipped=1.0 2024-08-20 05:15:36,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4673400.0, ans=0.1 2024-08-20 05:15:43,660 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.49 vs. limit=10.0 2024-08-20 05:15:45,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4673400.0, ans=0.125 2024-08-20 05:16:00,689 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 22 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-20 05:16:35,872 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 31 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-20 05:16:37,073 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.23 vs. limit=12.0 2024-08-20 05:16:48,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4673700.0, ans=0.125 2024-08-20 05:16:57,011 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.89 vs. limit=22.5 2024-08-20 05:16:57,537 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 8000, loss[loss=0.1058, beats_loss=0.01005, ecapa_loss=0.0001589, whisper_loss=0.0942, over 23668.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01038, ecapa_loss=0.0001413, whisper_loss=0.0894, over 3829198.69 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:17:11,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4673800.0, ans=0.125 2024-08-20 05:17:27,382 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 29 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-20 05:17:29,409 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 28 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-20 05:17:44,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4674000.0, ans=0.125 2024-08-20 05:17:45,908 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 22 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 05:18:34,775 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 27 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 05:18:41,453 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 8050, loss[loss=0.1075, beats_loss=0.008986, ecapa_loss=0.0001777, whisper_loss=0.09678, over 15486.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01041, ecapa_loss=0.0001419, whisper_loss=0.09001, over 3812023.39 frames. ], batch size: 63, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:18:42,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=15.0 2024-08-20 05:18:55,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4674300.0, ans=0.125 2024-08-20 05:18:59,652 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.351e+01 2.548e+01 2.857e+01 8.304e+01, threshold=5.095e+01, percent-clipped=2.0 2024-08-20 05:19:09,450 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-20 05:19:15,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4674400.0, ans=0.0 2024-08-20 05:19:56,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4674600.0, ans=0.125 2024-08-20 05:19:57,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4674600.0, ans=0.0 2024-08-20 05:20:16,103 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.51 vs. limit=22.5 2024-08-20 05:20:20,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4674700.0, ans=0.1 2024-08-20 05:20:32,107 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 8100, loss[loss=0.1102, beats_loss=0.009246, ecapa_loss=0.0001617, whisper_loss=0.09933, over 14617.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01049, ecapa_loss=0.0001408, whisper_loss=0.08929, over 3812217.62 frames. ], batch size: 59, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:21:15,809 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.77 vs. limit=15.0 2024-08-20 05:21:44,522 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 22 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-20 05:21:54,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4675100.0, ans=0.125 2024-08-20 05:22:18,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4675200.0, ans=0.0 2024-08-20 05:22:25,237 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 8150, loss[loss=0.1343, beats_loss=0.007418, ecapa_loss=0.0001534, whisper_loss=0.1253, over 14194.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01045, ecapa_loss=0.00014, whisper_loss=0.08985, over 3836113.24 frames. ], batch size: 52, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:22:34,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4675300.0, ans=0.2 2024-08-20 05:22:44,766 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 05:22:47,183 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.721e+01 2.184e+01 2.427e+01 2.667e+01 4.030e+01, threshold=4.854e+01, percent-clipped=0.0 2024-08-20 05:23:15,208 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 17 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-20 05:23:19,374 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 26 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-20 05:23:21,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4675500.0, ans=0.2 2024-08-20 05:23:25,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4675500.0, ans=0.125 2024-08-20 05:23:45,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4675600.0, ans=0.0 2024-08-20 05:24:05,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4675700.0, ans=0.0 2024-08-20 05:24:05,691 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.26 vs. limit=15.0 2024-08-20 05:24:21,745 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 8200, loss[loss=0.1012, beats_loss=0.009781, ecapa_loss=0.000156, whisper_loss=0.08982, over 21248.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01049, ecapa_loss=0.0001401, whisper_loss=0.08967, over 3809517.57 frames. ], batch size: 89, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:24:23,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4675800.0, ans=0.125 2024-08-20 05:24:29,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4675800.0, ans=0.125 2024-08-20 05:24:36,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4675800.0, ans=0.0 2024-08-20 05:24:45,109 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 20 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-20 05:24:46,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4675900.0, ans=0.125 2024-08-20 05:25:08,717 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 38 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 05:25:34,275 WARNING [optim.py:496] (3/4) Scaling gradients by 0.03858109936118126, model_norm_threshold=48.536659240722656 2024-08-20 05:25:34,440 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.171e+05, grad_sumsq=2.171e+05, orig_rms_sq=1.000e+00 2024-08-20 05:25:34,748 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 24 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-20 05:25:43,356 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=4.539e-02 2024-08-20 05:25:43,640 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.99 vs. limit=10.0 2024-08-20 05:25:51,264 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 14 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 05:25:52,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4676100.0, ans=0.1 2024-08-20 05:26:05,516 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 21 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 05:26:15,889 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 8250, loss[loss=0.07352, beats_loss=0.01273, ecapa_loss=0.0001438, whisper_loss=0.05935, over 19730.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01046, ecapa_loss=0.000141, whisper_loss=0.09012, over 3823556.06 frames. ], batch size: 84, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:26:36,938 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.406e+01 2.629e+01 3.126e+01 1.258e+03, threshold=5.257e+01, percent-clipped=4.0 2024-08-20 05:26:46,156 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2024-08-20 05:26:55,713 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 22 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 05:27:27,638 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 05:27:29,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4676600.0, ans=0.035 2024-08-20 05:28:07,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4676700.0, ans=0.1 2024-08-20 05:28:15,366 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 8300, loss[loss=0.1168, beats_loss=0.01007, ecapa_loss=0.0001347, whisper_loss=0.1054, over 23065.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01045, ecapa_loss=0.000141, whisper_loss=0.08922, over 3777897.88 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:28:17,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4676800.0, ans=0.1 2024-08-20 05:28:45,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4676900.0, ans=0.125 2024-08-20 05:29:16,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4677000.0, ans=0.125 2024-08-20 05:29:25,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4677100.0, ans=0.125 2024-08-20 05:29:43,218 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 25 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-20 05:30:07,545 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 24 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-20 05:30:08,655 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 8350, loss[loss=0.1157, beats_loss=0.008788, ecapa_loss=0.0001896, whisper_loss=0.105, over 15650.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01046, ecapa_loss=0.0001408, whisper_loss=0.08918, over 3811698.54 frames. ], batch size: 63, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:30:17,636 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.37 vs. limit=15.0 2024-08-20 05:30:26,968 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.380e+01 2.610e+01 3.013e+01 5.449e+01, threshold=5.219e+01, percent-clipped=1.0 2024-08-20 05:30:32,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4677400.0, ans=0.125 2024-08-20 05:30:36,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4677400.0, ans=0.125 2024-08-20 05:30:52,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4677500.0, ans=0.0 2024-08-20 05:31:10,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4677600.0, ans=0.0 2024-08-20 05:31:21,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4677600.0, ans=0.125 2024-08-20 05:31:22,453 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 32 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-20 05:31:24,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4677600.0, ans=0.09899494936611666 2024-08-20 05:31:29,964 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 11 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-20 05:31:48,903 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 8400, loss[loss=0.0823, beats_loss=0.01109, ecapa_loss=0.0001474, whisper_loss=0.06974, over 12591.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01049, ecapa_loss=0.0001403, whisper_loss=0.08958, over 3807632.06 frames. ], batch size: 51, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:31:50,871 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 26 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 05:31:53,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4677800.0, ans=0.07 2024-08-20 05:32:16,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4677900.0, ans=0.125 2024-08-20 05:32:22,000 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 20 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 05:32:31,975 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 05:32:36,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4678000.0, ans=0.125 2024-08-20 05:33:06,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4678100.0, ans=0.125 2024-08-20 05:33:07,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4678100.0, ans=0.125 2024-08-20 05:33:09,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4678100.0, ans=0.07 2024-08-20 05:33:41,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4678200.0, ans=0.125 2024-08-20 05:33:44,284 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 8450, loss[loss=0.1085, beats_loss=0.01264, ecapa_loss=0.0001072, whisper_loss=0.09477, over 23625.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01053, ecapa_loss=0.0001397, whisper_loss=0.08915, over 3806195.66 frames. ], batch size: 88, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:34:03,920 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.232e+01 2.452e+01 2.661e+01 1.500e+02, threshold=4.905e+01, percent-clipped=2.0 2024-08-20 05:34:19,517 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 31 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-20 05:34:31,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4678500.0, ans=0.125 2024-08-20 05:34:33,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4678500.0, ans=0.09899494936611666 2024-08-20 05:34:33,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4678500.0, ans=0.1 2024-08-20 05:34:37,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4678500.0, ans=0.0 2024-08-20 05:34:41,750 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.46 vs. limit=12.0 2024-08-20 05:34:44,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4678500.0, ans=0.1 2024-08-20 05:35:21,028 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.340e+05 2024-08-20 05:35:35,745 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 8500, loss[loss=0.09865, beats_loss=0.01201, ecapa_loss=0.0001565, whisper_loss=0.08507, over 15607.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01046, ecapa_loss=0.0001406, whisper_loss=0.08911, over 3784201.42 frames. ], batch size: 64, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:35:37,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4678800.0, ans=0.2 2024-08-20 05:35:42,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4678800.0, ans=0.125 2024-08-20 05:35:48,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4678800.0, ans=0.95 2024-08-20 05:36:42,491 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.69 vs. limit=22.5 2024-08-20 05:36:44,968 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 35 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-20 05:37:00,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4679100.0, ans=0.125 2024-08-20 05:37:08,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4679200.0, ans=0.125 2024-08-20 05:37:16,202 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 24 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-20 05:37:18,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4679200.0, ans=0.0 2024-08-20 05:37:28,712 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-20 05:37:30,696 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 8550, loss[loss=0.1071, beats_loss=0.009786, ecapa_loss=0.0001487, whisper_loss=0.09586, over 15209.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01039, ecapa_loss=0.0001411, whisper_loss=0.09015, over 3825330.86 frames. ], batch size: 59, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:37:42,046 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 05:37:50,666 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.232e+01 2.501e+01 2.726e+01 3.621e+01, threshold=5.003e+01, percent-clipped=0.0 2024-08-20 05:37:53,380 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 27 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-20 05:38:04,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4679400.0, ans=0.125 2024-08-20 05:38:05,592 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 21 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 05:38:11,502 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 05:38:14,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4679400.0, ans=0.125 2024-08-20 05:38:21,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4679500.0, ans=0.125 2024-08-20 05:38:26,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4679500.0, ans=0.0 2024-08-20 05:38:29,948 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 16 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 05:38:33,815 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 05:38:44,148 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 24 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-20 05:39:22,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4679700.0, ans=0.09899494936611666 2024-08-20 05:39:24,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4679700.0, ans=10.0 2024-08-20 05:39:27,395 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.86 vs. limit=12.0 2024-08-20 05:39:27,863 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 8600, loss[loss=0.1055, beats_loss=0.009133, ecapa_loss=0.0001165, whisper_loss=0.09523, over 14884.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01033, ecapa_loss=0.0001409, whisper_loss=0.09075, over 3799284.60 frames. ], batch size: 52, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:39:32,464 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=15.0 2024-08-20 05:39:44,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4679800.0, ans=0.125 2024-08-20 05:39:52,651 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 05:39:56,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=4679900.0, ans=0.025 2024-08-20 05:40:21,596 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 05:41:12,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4680200.0, ans=0.125 2024-08-20 05:41:12,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4680200.0, ans=0.1 2024-08-20 05:41:17,516 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 8650, loss[loss=0.1065, beats_loss=0.009938, ecapa_loss=0.000114, whisper_loss=0.09543, over 23292.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01035, ecapa_loss=0.0001397, whisper_loss=0.09005, over 3817406.24 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:41:33,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4680300.0, ans=0.125 2024-08-20 05:41:38,569 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.12 vs. limit=15.0 2024-08-20 05:41:39,191 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.692e+01 2.245e+01 2.496e+01 2.765e+01 3.926e+01, threshold=4.992e+01, percent-clipped=0.0 2024-08-20 05:41:43,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4680400.0, ans=0.125 2024-08-20 05:42:01,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4680400.0, ans=0.125 2024-08-20 05:42:09,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=4680500.0, ans=22.5 2024-08-20 05:42:10,823 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 05:42:30,896 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 12 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-20 05:43:03,928 WARNING [optim.py:496] (3/4) Scaling gradients by 0.040249649435281754, model_norm_threshold=49.920475006103516 2024-08-20 05:43:04,096 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.371e+05, grad_sumsq=4.172e+04, orig_rms_sq=3.286e+00 2024-08-20 05:43:15,061 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 8700, loss[loss=0.1005, beats_loss=0.00923, ecapa_loss=0.0001428, whisper_loss=0.08984, over 15332.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01035, ecapa_loss=0.0001401, whisper_loss=0.09017, over 3843713.06 frames. ], batch size: 62, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:44:03,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4681000.0, ans=0.0 2024-08-20 05:44:18,838 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 23 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-20 05:44:38,299 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 28 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-20 05:44:42,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4681100.0, ans=0.125 2024-08-20 05:44:46,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4681200.0, ans=0.07 2024-08-20 05:44:51,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4681200.0, ans=0.125 2024-08-20 05:45:05,546 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.70 vs. limit=22.5 2024-08-20 05:45:10,992 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 8750, loss[loss=0.08991, beats_loss=0.008939, ecapa_loss=0.0001669, whisper_loss=0.07931, over 14449.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01028, ecapa_loss=0.0001409, whisper_loss=0.09077, over 3830454.18 frames. ], batch size: 57, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:45:17,400 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 18 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 05:45:19,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4681300.0, ans=0.125 2024-08-20 05:45:28,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4681300.0, ans=0.125 2024-08-20 05:45:32,012 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.309e+01 2.539e+01 2.876e+01 1.240e+03, threshold=5.077e+01, percent-clipped=3.0 2024-08-20 05:46:33,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=4681600.0, ans=10.0 2024-08-20 05:46:42,894 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 17 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 05:46:44,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4681700.0, ans=0.0 2024-08-20 05:47:03,546 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2024-08-20 05:47:03,872 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 8800, loss[loss=0.1089, beats_loss=0.01088, ecapa_loss=0.0001229, whisper_loss=0.09678, over 23450.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01035, ecapa_loss=0.0001406, whisper_loss=0.09034, over 3855068.84 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:47:25,317 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0940530002117157, model_norm_threshold=50.77210235595703 2024-08-20 05:47:25,484 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.3.norm.log_scale with proportion 0.21, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.105e+04, grad_sumsq=6.105e+04, orig_rms_sq=1.000e+00 2024-08-20 05:47:55,932 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 24 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 05:48:07,307 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 29 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 05:48:10,325 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2024-08-20 05:48:25,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4682200.0, ans=0.125 2024-08-20 05:48:42,400 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 8850, loss[loss=0.06682, beats_loss=0.01243, ecapa_loss=0.0001009, whisper_loss=0.05339, over 13851.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01039, ecapa_loss=0.0001401, whisper_loss=0.0902, over 3842876.81 frames. ], batch size: 56, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:48:45,033 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 22 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 05:48:50,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4682300.0, ans=0.125 2024-08-20 05:48:58,668 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.82 vs. limit=10.0 2024-08-20 05:48:58,983 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.309e+01 2.540e+01 2.877e+01 5.398e+02, threshold=5.080e+01, percent-clipped=3.0 2024-08-20 05:49:04,488 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 27 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-20 05:49:05,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4682400.0, ans=0.1 2024-08-20 05:49:17,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4682400.0, ans=0.0 2024-08-20 05:49:41,008 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 18 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-20 05:49:48,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4682600.0, ans=0.04949747468305833 2024-08-20 05:49:50,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4682600.0, ans=0.1 2024-08-20 05:50:05,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=4682700.0, ans=10.0 2024-08-20 05:50:05,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4682700.0, ans=0.09899494936611666 2024-08-20 05:50:16,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4682700.0, ans=0.0 2024-08-20 05:50:18,904 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 8900, loss[loss=0.07864, beats_loss=0.01152, ecapa_loss=0.000143, whisper_loss=0.06569, over 17509.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01037, ecapa_loss=0.0001399, whisper_loss=0.09009, over 3839961.30 frames. ], batch size: 74, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:50:34,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.31 vs. limit=15.0 2024-08-20 05:50:47,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4682900.0, ans=0.125 2024-08-20 05:50:47,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=4682900.0, ans=15.0 2024-08-20 05:50:49,913 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.54 vs. limit=10.0 2024-08-20 05:50:52,401 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 16 from LS+wenet, 22 from Vox, 54 fro AS 2024-08-20 05:51:17,678 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 18 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-20 05:51:21,740 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 16 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-20 05:51:51,902 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 20 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-20 05:51:55,968 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 8950, loss[loss=0.1037, beats_loss=0.01116, ecapa_loss=8.897e-05, whisper_loss=0.09168, over 18798.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01042, ecapa_loss=0.0001387, whisper_loss=0.09011, over 3853668.37 frames. ], batch size: 70, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:52:04,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4683300.0, ans=0.2 2024-08-20 05:52:12,006 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.265e+01 2.516e+01 2.733e+01 4.609e+01, threshold=5.033e+01, percent-clipped=0.0 2024-08-20 05:52:17,842 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 05:53:01,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4683600.0, ans=0.1 2024-08-20 05:53:04,575 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 27 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 05:53:06,357 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 23 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-20 05:53:07,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4683700.0, ans=0.1 2024-08-20 05:53:14,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4683700.0, ans=0.1 2024-08-20 05:53:25,540 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 9000, loss[loss=0.08284, beats_loss=0.01011, ecapa_loss=0.0001596, whisper_loss=0.07114, over 12797.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01048, ecapa_loss=0.0001392, whisper_loss=0.08944, over 3842758.98 frames. ], batch size: 53, lr: 1.91e-03, grad_scale: 1.152921504606847e+18 2024-08-20 05:53:25,541 INFO [train_multi_KD3.py:1140] (3/4) Computing validation loss 2024-08-20 05:54:02,496 INFO [train_multi_KD3.py:1150] (3/4) Epoch 32, validation on ASR_libri: loss=0.254, beats_loss=0, ecapa_loss=0.0005075, whisper_loss=0.2489, over 931116.00 frames. 2024-08-20 05:54:24,116 INFO [train_multi_KD3.py:1150] (3/4) Epoch 32, validation on SV_voxceleb1: loss=0.004011, beats_loss=0, ecapa_loss=0.0004011, whisper_loss=0, over 944235.00 frames. 2024-08-20 05:56:01,378 INFO [train_multi_KD3.py:1150] (3/4) Epoch 32, validation on AT_audioset: loss=0.02303, beats_loss=0.02303, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 05:56:01,381 INFO [train_multi_KD3.py:1156] (3/4) Maximum memory allocated so far is 32606MB 2024-08-20 05:56:01,627 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 14 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-20 05:56:38,501 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 18 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-20 05:56:43,226 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 29 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 05:56:51,485 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 14 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-20 05:56:56,838 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=15.0 2024-08-20 05:57:24,007 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 9050, loss[loss=0.1322, beats_loss=0.007114, ecapa_loss=0.0001671, whisper_loss=0.1234, over 16743.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0105, ecapa_loss=0.0001391, whisper_loss=0.08938, over 3810412.18 frames. ], batch size: 68, lr: 1.91e-03, grad_scale: 1.152921504606847e+18 2024-08-20 05:57:33,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4684300.0, ans=0.125 2024-08-20 05:57:38,531 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.206e+01 2.470e+01 2.742e+01 4.296e+01, threshold=4.939e+01, percent-clipped=0.0 2024-08-20 05:57:44,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4684400.0, ans=0.05 2024-08-20 05:57:58,949 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 05:58:13,947 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 05:58:21,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4684600.0, ans=0.1 2024-08-20 05:58:21,699 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.73 vs. limit=5.0 2024-08-20 05:58:45,892 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 9100, loss[loss=0.1032, beats_loss=0.009695, ecapa_loss=0.0001641, whisper_loss=0.0919, over 16829.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01054, ecapa_loss=0.0001399, whisper_loss=0.08956, over 3804562.84 frames. ], batch size: 68, lr: 1.91e-03, grad_scale: 1.152921504606847e+18 2024-08-20 05:58:49,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4684800.0, ans=0.2 2024-08-20 05:58:56,857 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 13 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-20 05:59:03,421 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-20 05:59:28,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4685000.0, ans=0.2 2024-08-20 05:59:29,637 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 32 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 05:59:38,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4685100.0, ans=0.0 2024-08-20 05:59:40,503 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-08-20 06:00:05,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4685200.0, ans=0.1 2024-08-20 06:00:10,214 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 9150, loss[loss=0.1008, beats_loss=0.009944, ecapa_loss=0.0001527, whisper_loss=0.08936, over 16301.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01055, ecapa_loss=0.0001392, whisper_loss=0.08926, over 3804115.14 frames. ], batch size: 64, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:00:12,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4685300.0, ans=0.0 2024-08-20 06:00:16,993 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 27 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-20 06:00:27,035 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.310e+01 2.550e+01 2.853e+01 1.227e+02, threshold=5.100e+01, percent-clipped=1.0 2024-08-20 06:00:32,263 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-20 06:00:49,051 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=15.0 2024-08-20 06:01:00,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4685600.0, ans=0.1 2024-08-20 06:01:10,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4685600.0, ans=0.0 2024-08-20 06:01:10,888 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.96 vs. limit=12.0 2024-08-20 06:01:35,571 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 9200, loss[loss=0.1167, beats_loss=0.00728, ecapa_loss=0.000161, whisper_loss=0.1078, over 15708.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01055, ecapa_loss=0.0001389, whisper_loss=0.0894, over 3789279.34 frames. ], batch size: 61, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:01:50,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4685800.0, ans=0.0 2024-08-20 06:02:03,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4685900.0, ans=0.1 2024-08-20 06:02:10,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4686000.0, ans=0.125 2024-08-20 06:02:13,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4686000.0, ans=0.125 2024-08-20 06:02:17,826 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-20 06:02:51,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4686200.0, ans=0.0 2024-08-20 06:02:56,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4686200.0, ans=0.125 2024-08-20 06:03:02,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4686300.0, ans=0.125 2024-08-20 06:03:02,853 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 9250, loss[loss=0.1076, beats_loss=0.009128, ecapa_loss=0.0001485, whisper_loss=0.09703, over 18516.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01047, ecapa_loss=0.0001398, whisper_loss=0.08973, over 3785652.80 frames. ], batch size: 73, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:03:20,029 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.318e+01 2.500e+01 2.733e+01 3.571e+01, threshold=4.999e+01, percent-clipped=0.0 2024-08-20 06:03:45,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4686500.0, ans=0.0 2024-08-20 06:03:50,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4686500.0, ans=0.0 2024-08-20 06:03:53,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4686500.0, ans=0.125 2024-08-20 06:03:57,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4686600.0, ans=0.1 2024-08-20 06:04:04,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4686600.0, ans=0.1 2024-08-20 06:04:05,337 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 36 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-20 06:04:17,823 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 20 from LS+wenet, 28 from Vox, 44 fro AS 2024-08-20 06:04:26,332 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.223e+01 2024-08-20 06:04:31,237 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 9300, loss[loss=0.1063, beats_loss=0.009492, ecapa_loss=0.0001227, whisper_loss=0.09557, over 14727.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0104, ecapa_loss=0.0001406, whisper_loss=0.08937, over 3762987.56 frames. ], batch size: 57, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:04:58,189 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 06:05:07,334 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 29 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-20 06:05:47,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4687200.0, ans=0.0 2024-08-20 06:05:50,630 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2024-08-20 06:06:05,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4687200.0, ans=0.125 2024-08-20 06:06:06,745 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 35 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-20 06:06:08,692 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 9350, loss[loss=0.1232, beats_loss=0.01014, ecapa_loss=0.0001157, whisper_loss=0.1119, over 23393.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0104, ecapa_loss=0.0001396, whisper_loss=0.08989, over 3804419.95 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:06:28,518 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.303e+01 2.586e+01 2.791e+01 3.756e+01, threshold=5.173e+01, percent-clipped=0.0 2024-08-20 06:06:36,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4687400.0, ans=0.1 2024-08-20 06:06:43,125 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 28 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-20 06:06:49,972 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.72 vs. limit=15.0 2024-08-20 06:07:16,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4687600.0, ans=0.1 2024-08-20 06:07:36,257 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 22 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-20 06:07:39,647 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 9400, loss[loss=0.1076, beats_loss=0.009223, ecapa_loss=0.0001633, whisper_loss=0.09674, over 14300.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01044, ecapa_loss=0.0001419, whisper_loss=0.0898, over 3798772.22 frames. ], batch size: 57, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:07:43,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4687800.0, ans=0.125 2024-08-20 06:08:00,528 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2024-08-20 06:08:06,005 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 15 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-20 06:08:21,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4688000.0, ans=0.0 2024-08-20 06:08:27,541 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2024-08-20 06:08:31,145 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 06:08:59,502 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.07 vs. limit=6.0 2024-08-20 06:09:08,344 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 9450, loss[loss=0.1004, beats_loss=0.01107, ecapa_loss=0.0001298, whisper_loss=0.088, over 23249.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01047, ecapa_loss=0.0001424, whisper_loss=0.08968, over 3844794.63 frames. ], batch size: 93, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:09:27,829 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.376e+01 2.594e+01 2.934e+01 1.922e+02, threshold=5.189e+01, percent-clipped=1.0 2024-08-20 06:09:28,136 INFO [train_multi_KD3.py:845] (3/4) A total of 96 cuts. 33 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 06:09:33,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4688400.0, ans=0.035 2024-08-20 06:09:33,910 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.526e+00 2024-08-20 06:09:37,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=4688400.0, ans=15.0 2024-08-20 06:09:45,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4688500.0, ans=0.1 2024-08-20 06:09:54,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4688500.0, ans=0.0 2024-08-20 06:10:04,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4688600.0, ans=0.125 2024-08-20 06:10:06,273 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=15.0 2024-08-20 06:10:26,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4688700.0, ans=0.125 2024-08-20 06:10:28,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4688700.0, ans=0.2 2024-08-20 06:10:36,263 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 9500, loss[loss=0.09538, beats_loss=0.01108, ecapa_loss=0.0001432, whisper_loss=0.08286, over 22366.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01043, ecapa_loss=0.0001425, whisper_loss=0.09005, over 3803917.92 frames. ], batch size: 89, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:10:48,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4688800.0, ans=0.125 2024-08-20 06:10:48,854 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=11.21 vs. limit=12.0 2024-08-20 06:10:49,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4688800.0, ans=0.0 2024-08-20 06:10:57,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4688900.0, ans=0.0 2024-08-20 06:11:12,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4689000.0, ans=0.125 2024-08-20 06:11:18,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4689000.0, ans=0.0 2024-08-20 06:11:27,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4689100.0, ans=0.035 2024-08-20 06:11:56,693 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 25 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-20 06:12:03,406 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 9550, loss[loss=0.09027, beats_loss=0.01164, ecapa_loss=0.000121, whisper_loss=0.07742, over 19393.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01046, ecapa_loss=0.0001415, whisper_loss=0.08976, over 3790975.50 frames. ], batch size: 74, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:12:05,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4689300.0, ans=0.125 2024-08-20 06:12:09,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4689300.0, ans=0.0 2024-08-20 06:12:10,059 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.21 vs. limit=22.5 2024-08-20 06:12:18,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4689300.0, ans=0.0 2024-08-20 06:12:21,199 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.271e+01 2.487e+01 2.797e+01 1.341e+02, threshold=4.974e+01, percent-clipped=1.0 2024-08-20 06:12:30,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4689400.0, ans=0.125 2024-08-20 06:12:39,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4689500.0, ans=0.0 2024-08-20 06:12:54,535 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 35 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-20 06:13:19,826 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.14 vs. limit=6.0 2024-08-20 06:13:32,317 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 9600, loss[loss=0.1062, beats_loss=0.00809, ecapa_loss=0.0001408, whisper_loss=0.09674, over 16287.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01049, ecapa_loss=0.0001397, whisper_loss=0.08935, over 3805500.91 frames. ], batch size: 62, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:13:43,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4689800.0, ans=0.125 2024-08-20 06:13:52,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4689900.0, ans=0.1 2024-08-20 06:13:55,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4689900.0, ans=0.0 2024-08-20 06:14:23,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4690000.0, ans=0.125 2024-08-20 06:14:48,932 INFO [train_multi_KD3.py:845] (3/4) A total of 96 cuts. 33 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-20 06:15:02,822 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 20 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-20 06:15:06,762 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 9650, loss[loss=0.09222, beats_loss=0.01341, ecapa_loss=0.0001334, whisper_loss=0.07747, over 18863.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01048, ecapa_loss=0.0001406, whisper_loss=0.08943, over 3834478.12 frames. ], batch size: 79, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:15:08,586 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 21 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-20 06:15:12,630 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 25 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-20 06:15:18,779 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=15.0 2024-08-20 06:15:26,367 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.336e+01 2.779e+01 3.042e+01 4.169e+01, threshold=5.558e+01, percent-clipped=0.0 2024-08-20 06:15:30,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4690400.0, ans=0.125 2024-08-20 06:15:30,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4690400.0, ans=0.0 2024-08-20 06:15:45,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4690500.0, ans=0.125 2024-08-20 06:15:54,758 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 19 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-20 06:15:55,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4690500.0, ans=0.125 2024-08-20 06:15:55,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4690500.0, ans=0.1 2024-08-20 06:15:55,574 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2024-08-20 06:16:05,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4690600.0, ans=0.125 2024-08-20 06:16:15,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4690700.0, ans=0.125 2024-08-20 06:16:32,801 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 9700, loss[loss=0.1064, beats_loss=0.009692, ecapa_loss=0.0001541, whisper_loss=0.09516, over 12684.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01047, ecapa_loss=0.0001407, whisper_loss=0.08897, over 3839969.12 frames. ], batch size: 49, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:16:49,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4690900.0, ans=0.125 2024-08-20 06:16:57,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4690900.0, ans=0.125 2024-08-20 06:17:07,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4691000.0, ans=0.0 2024-08-20 06:17:07,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4691000.0, ans=0.1 2024-08-20 06:17:20,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4691100.0, ans=0.0 2024-08-20 06:17:28,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4691100.0, ans=0.1 2024-08-20 06:17:28,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4691100.0, ans=0.125 2024-08-20 06:17:33,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4691100.0, ans=0.125 2024-08-20 06:17:41,733 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-20 06:17:54,439 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 9750, loss[loss=0.09885, beats_loss=0.01101, ecapa_loss=0.0001353, whisper_loss=0.08649, over 16582.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.0105, ecapa_loss=0.0001411, whisper_loss=0.08852, over 3816763.19 frames. ], batch size: 68, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:17:55,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4691300.0, ans=0.0 2024-08-20 06:18:08,052 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 34 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-20 06:18:12,450 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.687e+01 2.245e+01 2.617e+01 2.841e+01 5.114e+01, threshold=5.235e+01, percent-clipped=0.0 2024-08-20 06:18:13,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4691400.0, ans=0.125 2024-08-20 06:18:19,305 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 24 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 06:18:26,895 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.21 vs. limit=15.0 2024-08-20 06:18:45,522 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.193e-01 2024-08-20 06:18:49,403 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.31 vs. limit=15.0 2024-08-20 06:18:59,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4691700.0, ans=0.0 2024-08-20 06:19:16,755 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 9800, loss[loss=0.1094, beats_loss=0.008112, ecapa_loss=0.0001685, whisper_loss=0.09965, over 14853.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01045, ecapa_loss=0.000141, whisper_loss=0.08892, over 3791148.07 frames. ], batch size: 57, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:19:25,687 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 22 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 06:19:40,185 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 15 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-20 06:20:00,666 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 33 from LS+wenet, 29 from Vox, 24 fro AS 2024-08-20 06:20:02,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4692000.0, ans=0.125 2024-08-20 06:20:03,846 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 22 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-20 06:20:05,425 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 06:20:18,930 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 24 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-20 06:20:29,349 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.36 vs. limit=15.0 2024-08-20 06:20:39,627 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 9850, loss[loss=0.09193, beats_loss=0.01042, ecapa_loss=0.0001238, whisper_loss=0.08028, over 13612.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01036, ecapa_loss=0.0001398, whisper_loss=0.08986, over 3768127.95 frames. ], batch size: 53, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:20:45,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4692300.0, ans=0.1 2024-08-20 06:20:50,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4692300.0, ans=0.125 2024-08-20 06:20:55,188 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 19 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-20 06:20:58,008 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.366e+01 2.568e+01 2.856e+01 6.259e+01, threshold=5.136e+01, percent-clipped=2.0 2024-08-20 06:21:03,141 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 24 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-20 06:21:03,943 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2024-08-20 06:21:11,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4692500.0, ans=0.1 2024-08-20 06:21:20,497 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.06 vs. limit=15.0 2024-08-20 06:21:28,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4692600.0, ans=0.07 2024-08-20 06:21:29,813 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 21 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-20 06:22:01,672 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 19 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-20 06:22:02,855 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 9900, loss[loss=0.1064, beats_loss=0.01123, ecapa_loss=0.0001184, whisper_loss=0.09396, over 16481.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01043, ecapa_loss=0.0001402, whisper_loss=0.08918, over 3764196.06 frames. ], batch size: 62, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:22:11,116 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2024-08-20 06:22:25,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4692900.0, ans=0.1 2024-08-20 06:22:40,968 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 23 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-20 06:22:42,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4693000.0, ans=0.0 2024-08-20 06:22:44,295 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 17 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-20 06:22:47,558 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 33 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 06:22:54,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4693100.0, ans=0.07 2024-08-20 06:23:17,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4693200.0, ans=0.125 2024-08-20 06:23:24,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4693300.0, ans=0.0 2024-08-20 06:23:24,913 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 9950, loss[loss=0.1145, beats_loss=0.009966, ecapa_loss=0.0001285, whisper_loss=0.1032, over 19970.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01048, ecapa_loss=0.0001398, whisper_loss=0.08903, over 3746567.81 frames. ], batch size: 78, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:23:29,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4693300.0, ans=0.2 2024-08-20 06:23:34,741 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-20 06:23:37,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4693300.0, ans=0.125 2024-08-20 06:23:37,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4693300.0, ans=0.0 2024-08-20 06:23:42,409 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.759e+01 2.243e+01 2.443e+01 2.710e+01 3.765e+01, threshold=4.885e+01, percent-clipped=0.0 2024-08-20 06:24:06,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4693500.0, ans=0.125 2024-08-20 06:24:16,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4693600.0, ans=0.1 2024-08-20 06:24:37,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=4693700.0, ans=10.0 2024-08-20 06:24:39,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4693700.0, ans=0.125 2024-08-20 06:24:49,111 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 10000, loss[loss=0.06824, beats_loss=0.01449, ecapa_loss=9.085e-05, whisper_loss=0.05284, over 17725.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01054, ecapa_loss=0.0001389, whisper_loss=0.08882, over 3754180.93 frames. ], batch size: 70, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:24:53,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4693800.0, ans=0.0 2024-08-20 06:24:56,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4693800.0, ans=0.125 2024-08-20 06:25:03,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4693800.0, ans=0.1 2024-08-20 06:25:08,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4693900.0, ans=0.125 2024-08-20 06:25:18,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4693900.0, ans=0.125 2024-08-20 06:25:29,767 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 24 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-20 06:25:37,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4694000.0, ans=0.125 2024-08-20 06:25:44,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4694100.0, ans=0.1 2024-08-20 06:26:14,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4694200.0, ans=0.0 2024-08-20 06:26:19,067 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 10050, loss[loss=0.07729, beats_loss=0.01057, ecapa_loss=0.0001347, whisper_loss=0.06537, over 16943.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01052, ecapa_loss=0.0001398, whisper_loss=0.08944, over 3782694.98 frames. ], batch size: 69, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:26:31,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4694300.0, ans=0.1 2024-08-20 06:26:37,063 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.304e+01 2.607e+01 2.920e+01 4.346e+01, threshold=5.214e+01, percent-clipped=0.0 2024-08-20 06:26:39,593 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 19 from LS+wenet, 24 from Vox, 18 fro AS 2024-08-20 06:26:46,017 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 19 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-20 06:27:08,017 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2024-08-20 06:27:10,131 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 27 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-20 06:27:15,337 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 14 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 06:27:20,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4694600.0, ans=0.0 2024-08-20 06:27:20,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4694600.0, ans=0.07 2024-08-20 06:27:26,425 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 20 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-20 06:27:47,736 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 10100, loss[loss=0.1242, beats_loss=0.007814, ecapa_loss=0.0001307, whisper_loss=0.115, over 23206.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01054, ecapa_loss=0.0001402, whisper_loss=0.08935, over 3786796.20 frames. ], batch size: 89, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:27:56,412 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 35 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-20 06:27:59,574 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 24 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-20 06:28:11,337 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.43 vs. limit=12.0 2024-08-20 06:28:15,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4694900.0, ans=0.1 2024-08-20 06:28:31,725 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 23 from LS+wenet, 20 from Vox, 50 fro AS 2024-08-20 06:28:49,456 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 06:28:58,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4695100.0, ans=0.125 2024-08-20 06:29:01,020 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 28 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-20 06:29:06,225 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 21 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-20 06:29:08,851 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 14 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-20 06:29:14,751 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 16 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-20 06:29:22,550 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 10150, loss[loss=0.1034, beats_loss=0.01219, ecapa_loss=0.0001169, whisper_loss=0.09, over 21865.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01057, ecapa_loss=0.0001394, whisper_loss=0.0889, over 3742177.54 frames. ], batch size: 85, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:29:40,502 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-08-20 06:29:44,565 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.187e+01 2.409e+01 2.808e+01 3.836e+01, threshold=4.818e+01, percent-clipped=0.0 2024-08-20 06:29:50,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4695400.0, ans=0.1 2024-08-20 06:30:14,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4695500.0, ans=0.125 2024-08-20 06:30:15,604 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 16 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-20 06:30:57,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4695700.0, ans=0.09899494936611666 2024-08-20 06:31:02,378 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 10200, loss[loss=0.09317, beats_loss=0.01263, ecapa_loss=0.0001117, whisper_loss=0.07943, over 18730.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01047, ecapa_loss=0.0001394, whisper_loss=0.0905, over 3775271.74 frames. ], batch size: 76, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:31:07,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4695800.0, ans=0.125 2024-08-20 06:31:26,112 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 23 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-20 06:31:38,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4695900.0, ans=0.125 2024-08-20 06:31:39,968 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 06:31:42,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4696000.0, ans=0.1 2024-08-20 06:31:56,678 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 28 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-20 06:32:07,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2024-08-20 06:32:12,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4696100.0, ans=0.05 2024-08-20 06:32:19,693 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 16 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-20 06:32:30,165 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 22 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 06:32:32,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4696200.0, ans=0.125 2024-08-20 06:32:39,477 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 10250, loss[loss=0.112, beats_loss=0.01049, ecapa_loss=0.0001599, whisper_loss=0.09987, over 22289.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01052, ecapa_loss=0.000139, whisper_loss=0.09008, over 3782140.57 frames. ], batch size: 92, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:32:55,746 WARNING [optim.py:496] (3/4) Scaling gradients by 0.07058558613061905, model_norm_threshold=48.17802047729492 2024-08-20 06:32:55,913 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.542e+04, grad_sumsq=7.542e+04, orig_rms_sq=1.000e+00 2024-08-20 06:33:01,504 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.230e+01 2.493e+01 2.759e+01 6.825e+02, threshold=4.986e+01, percent-clipped=2.0 2024-08-20 06:33:11,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4696400.0, ans=0.0 2024-08-20 06:33:21,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4696500.0, ans=0.125 2024-08-20 06:33:37,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4696500.0, ans=0.0 2024-08-20 06:33:44,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4696600.0, ans=0.125 2024-08-20 06:33:54,081 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 19 from LS+wenet, 9 from Vox, 23 fro AS 2024-08-20 06:33:59,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=4696700.0, ans=0.025 2024-08-20 06:34:02,429 WARNING [optim.py:496] (3/4) Scaling gradients by 0.07055973261594772, model_norm_threshold=49.85801315307617 2024-08-20 06:34:02,595 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.714e+04, grad_sumsq=4.714e+04, orig_rms_sq=1.000e+00 2024-08-20 06:34:08,968 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.21 vs. limit=15.0 2024-08-20 06:34:22,331 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 10300, loss[loss=0.09828, beats_loss=0.01041, ecapa_loss=0.0001376, whisper_loss=0.0865, over 23117.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.0001398, whisper_loss=0.09047, over 3816528.71 frames. ], batch size: 94, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:34:40,373 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 25 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 06:34:44,977 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 06:34:53,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4696900.0, ans=0.1 2024-08-20 06:35:47,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4697200.0, ans=0.125 2024-08-20 06:35:49,852 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.38 vs. limit=12.0 2024-08-20 06:35:55,737 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.22 vs. limit=15.0 2024-08-20 06:36:04,312 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 10350, loss[loss=0.08726, beats_loss=0.01009, ecapa_loss=0.0001362, whisper_loss=0.07581, over 17098.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01049, ecapa_loss=0.0001415, whisper_loss=0.09062, over 3850046.71 frames. ], batch size: 67, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:36:27,481 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.261e+01 2.488e+01 2.810e+01 7.066e+02, threshold=4.977e+01, percent-clipped=3.0 2024-08-20 06:36:42,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4697400.0, ans=0.2 2024-08-20 06:36:45,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4697500.0, ans=0.1 2024-08-20 06:37:25,096 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 18 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-20 06:37:33,803 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 38 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 06:37:36,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4697700.0, ans=0.2 2024-08-20 06:37:38,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4697700.0, ans=0.125 2024-08-20 06:37:47,022 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 10400, loss[loss=0.1283, beats_loss=0.009004, ecapa_loss=0.0001306, whisper_loss=0.118, over 19928.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01053, ecapa_loss=0.0001411, whisper_loss=0.08989, over 3831837.21 frames. ], batch size: 76, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:38:19,380 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 29 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-20 06:38:29,012 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.32 vs. limit=12.0 2024-08-20 06:38:39,827 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 30 from LS+wenet, 13 from Vox, 15 fro AS 2024-08-20 06:38:46,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4698000.0, ans=0.125 2024-08-20 06:38:56,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4698100.0, ans=0.125 2024-08-20 06:39:31,462 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 10450, loss[loss=0.103, beats_loss=0.01196, ecapa_loss=0.0001473, whisper_loss=0.08958, over 15360.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01047, ecapa_loss=0.000141, whisper_loss=0.09053, over 3826538.42 frames. ], batch size: 63, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:39:48,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4698300.0, ans=0.0 2024-08-20 06:39:52,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4698400.0, ans=0.125 2024-08-20 06:39:53,389 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.263e+01 2.457e+01 2.777e+01 9.468e+01, threshold=4.915e+01, percent-clipped=2.0 2024-08-20 06:40:09,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=4698400.0, ans=0.025 2024-08-20 06:40:18,525 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=15.0 2024-08-20 06:40:33,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4698600.0, ans=0.1 2024-08-20 06:40:36,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4698600.0, ans=0.125 2024-08-20 06:40:37,325 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 06:40:55,926 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 22 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-20 06:40:56,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4698700.0, ans=0.125 2024-08-20 06:40:59,174 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.68 vs. limit=15.0 2024-08-20 06:41:00,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4698700.0, ans=0.0 2024-08-20 06:41:11,111 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 10500, loss[loss=0.09987, beats_loss=0.009516, ecapa_loss=0.0001305, whisper_loss=0.08905, over 17298.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0104, ecapa_loss=0.0001415, whisper_loss=0.09059, over 3844514.93 frames. ], batch size: 67, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:41:17,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4698800.0, ans=0.125 2024-08-20 06:41:19,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4698800.0, ans=0.2 2024-08-20 06:41:45,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4698900.0, ans=0.1 2024-08-20 06:41:46,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4698900.0, ans=0.0 2024-08-20 06:41:55,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4699000.0, ans=0.125 2024-08-20 06:41:57,828 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.40 vs. limit=15.0 2024-08-20 06:42:00,746 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 30 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 06:42:13,155 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 28 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-20 06:42:21,265 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 16 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-20 06:42:34,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4699200.0, ans=0.125 2024-08-20 06:42:54,632 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 10550, loss[loss=0.1206, beats_loss=0.00996, ecapa_loss=0.0001223, whisper_loss=0.1094, over 24184.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01024, ecapa_loss=0.0001432, whisper_loss=0.09134, over 3828630.17 frames. ], batch size: 89, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:42:59,038 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 30 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 06:43:07,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4699300.0, ans=0.0 2024-08-20 06:43:17,677 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.308e+01 2.564e+01 2.826e+01 3.881e+01, threshold=5.129e+01, percent-clipped=0.0 2024-08-20 06:43:27,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4699400.0, ans=0.2 2024-08-20 06:43:33,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4699400.0, ans=0.1 2024-08-20 06:43:44,339 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2024-08-20 06:43:46,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4699500.0, ans=0.125 2024-08-20 06:44:27,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4699700.0, ans=0.0 2024-08-20 06:44:31,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4699700.0, ans=0.125 2024-08-20 06:44:38,387 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 10600, loss[loss=0.08919, beats_loss=0.01135, ecapa_loss=0.0001202, whisper_loss=0.07664, over 22932.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01025, ecapa_loss=0.000143, whisper_loss=0.0908, over 3847873.46 frames. ], batch size: 94, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:44:44,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4699800.0, ans=0.125 2024-08-20 06:44:55,963 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 16 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 06:45:00,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4699900.0, ans=0.125 2024-08-20 06:45:10,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4699900.0, ans=0.125 2024-08-20 06:45:46,806 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 18 from LS+wenet, 21 from Vox, 52 fro AS 2024-08-20 06:45:50,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4700100.0, ans=0.1 2024-08-20 06:45:53,308 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 21 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-20 06:45:54,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4700100.0, ans=0.125 2024-08-20 06:46:24,873 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 10650, loss[loss=0.1126, beats_loss=0.009183, ecapa_loss=0.0001602, whisper_loss=0.1018, over 21936.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01034, ecapa_loss=0.0001422, whisper_loss=0.09027, over 3844526.08 frames. ], batch size: 88, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:46:25,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4700300.0, ans=0.125 2024-08-20 06:46:46,908 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.694e+01 2.297e+01 2.515e+01 2.879e+01 5.897e+01, threshold=5.029e+01, percent-clipped=1.0 2024-08-20 06:46:47,153 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 30 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-20 06:46:51,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4700400.0, ans=0.0 2024-08-20 06:47:08,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4700500.0, ans=0.125 2024-08-20 06:47:22,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4700500.0, ans=0.1 2024-08-20 06:47:22,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4700500.0, ans=0.125 2024-08-20 06:47:56,940 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 21 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-20 06:48:09,997 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 10700, loss[loss=0.1012, beats_loss=0.01074, ecapa_loss=0.0001384, whisper_loss=0.08907, over 22103.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01039, ecapa_loss=0.0001405, whisper_loss=0.09054, over 3847677.34 frames. ], batch size: 86, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:48:22,179 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.29 vs. limit=10.0 2024-08-20 06:48:50,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4701000.0, ans=0.125 2024-08-20 06:48:56,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4701000.0, ans=0.0 2024-08-20 06:49:14,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4701100.0, ans=0.125 2024-08-20 06:49:15,441 WARNING [optim.py:496] (3/4) Scaling gradients by 0.023652782663702965, model_norm_threshold=50.29466247558594 2024-08-20 06:49:15,618 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.37, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.689e+06, grad_sumsq=1.581e+08, orig_rms_sq=1.068e-02 2024-08-20 06:49:27,621 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 29 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 06:49:35,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=4701200.0, ans=15.0 2024-08-20 06:49:46,674 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 31 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-20 06:49:47,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4701200.0, ans=0.125 2024-08-20 06:49:50,249 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 10750, loss[loss=0.08129, beats_loss=0.01296, ecapa_loss=0.0001361, whisper_loss=0.06697, over 20266.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01046, ecapa_loss=0.0001397, whisper_loss=0.09001, over 3828137.93 frames. ], batch size: 84, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:50:03,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4701300.0, ans=0.125 2024-08-20 06:50:11,681 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.335e+01 2.524e+01 2.835e+01 2.126e+03, threshold=5.048e+01, percent-clipped=3.0 2024-08-20 06:50:23,023 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 33 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 06:50:54,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4701600.0, ans=0.125 2024-08-20 06:51:02,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4701600.0, ans=0.1 2024-08-20 06:51:09,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4701700.0, ans=0.0 2024-08-20 06:51:29,637 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 10800, loss[loss=0.1132, beats_loss=0.006369, ecapa_loss=0.0001475, whisper_loss=0.1053, over 15496.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.000139, whisper_loss=0.09039, over 3837439.08 frames. ], batch size: 55, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:51:39,689 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.74 vs. limit=15.0 2024-08-20 06:51:56,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4701900.0, ans=0.2 2024-08-20 06:52:20,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4702000.0, ans=0.0 2024-08-20 06:52:25,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4702000.0, ans=0.125 2024-08-20 06:52:31,827 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 16 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-20 06:52:32,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4702100.0, ans=0.125 2024-08-20 06:52:36,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4702100.0, ans=0.2 2024-08-20 06:52:45,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4702100.0, ans=0.0 2024-08-20 06:53:08,928 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 10850, loss[loss=0.1005, beats_loss=0.01194, ecapa_loss=0.0001149, whisper_loss=0.0874, over 22484.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01047, ecapa_loss=0.000139, whisper_loss=0.08969, over 3790041.70 frames. ], batch size: 89, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:53:22,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4702300.0, ans=0.125 2024-08-20 06:53:30,959 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.288e+01 2.456e+01 2.756e+01 3.873e+01, threshold=4.912e+01, percent-clipped=0.0 2024-08-20 06:53:38,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4702400.0, ans=0.1 2024-08-20 06:54:12,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4702600.0, ans=0.125 2024-08-20 06:54:22,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4702600.0, ans=0.125 2024-08-20 06:54:23,258 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 14 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-20 06:54:25,400 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 17 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-20 06:54:48,402 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 10900, loss[loss=0.09928, beats_loss=0.01015, ecapa_loss=0.0001308, whisper_loss=0.08783, over 19874.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01048, ecapa_loss=0.000138, whisper_loss=0.08905, over 3798343.27 frames. ], batch size: 78, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:54:51,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4702800.0, ans=0.0 2024-08-20 06:54:55,811 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.09 vs. limit=15.0 2024-08-20 06:54:57,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4702800.0, ans=0.09899494936611666 2024-08-20 06:55:01,464 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.67 vs. limit=15.0 2024-08-20 06:55:12,563 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2024-08-20 06:55:30,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4703000.0, ans=0.125 2024-08-20 06:55:30,431 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2024-08-20 06:55:56,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4703100.0, ans=0.0 2024-08-20 06:56:04,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4703200.0, ans=0.125 2024-08-20 06:56:13,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4703200.0, ans=0.1 2024-08-20 06:56:19,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4703200.0, ans=0.0 2024-08-20 06:56:25,276 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 10950, loss[loss=0.09724, beats_loss=0.005804, ecapa_loss=0.000234, whisper_loss=0.0891, over 12740.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01045, ecapa_loss=0.0001386, whisper_loss=0.08955, over 3805562.41 frames. ], batch size: 50, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:56:34,608 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.55 vs. limit=15.0 2024-08-20 06:56:40,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4703300.0, ans=0.0 2024-08-20 06:56:46,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4703400.0, ans=0.125 2024-08-20 06:56:47,526 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.264e+01 2.421e+01 2.646e+01 4.130e+01, threshold=4.843e+01, percent-clipped=0.0 2024-08-20 06:56:51,809 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.01 vs. limit=22.5 2024-08-20 06:56:57,326 WARNING [optim.py:496] (3/4) Scaling gradients by 0.04336608201265335, model_norm_threshold=48.42934799194336 2024-08-20 06:56:57,494 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.187e+05, grad_sumsq=2.051e+07, orig_rms_sq=1.067e-02 2024-08-20 06:57:08,671 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.32 vs. limit=22.5 2024-08-20 06:57:09,575 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 26 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 06:57:56,658 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 11000, loss[loss=0.09273, beats_loss=0.01161, ecapa_loss=0.0001482, whisper_loss=0.07964, over 17582.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01038, ecapa_loss=0.0001397, whisper_loss=0.09108, over 3804894.91 frames. ], batch size: 74, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:57:59,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4703800.0, ans=0.125 2024-08-20 06:58:03,355 WARNING [optim.py:496] (3/4) Scaling gradients by 0.06608612835407257, model_norm_threshold=48.42934799194336 2024-08-20 06:58:03,523 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.920e+04, grad_sumsq=8.920e+04, orig_rms_sq=1.000e+00 2024-08-20 06:58:25,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4703900.0, ans=0.0 2024-08-20 06:58:25,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4703900.0, ans=0.07 2024-08-20 06:58:30,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4704000.0, ans=0.0 2024-08-20 06:58:48,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4704100.0, ans=0.025 2024-08-20 06:58:49,237 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 22 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-20 06:59:14,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4704200.0, ans=0.125 2024-08-20 06:59:19,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4704200.0, ans=0.0 2024-08-20 06:59:22,014 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 21 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 06:59:23,038 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 11050, loss[loss=0.08685, beats_loss=0.01215, ecapa_loss=0.0001399, whisper_loss=0.0733, over 19303.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01048, ecapa_loss=0.0001406, whisper_loss=0.09074, over 3828749.01 frames. ], batch size: 80, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:59:31,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4704300.0, ans=0.04949747468305833 2024-08-20 06:59:43,208 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.346e+01 2.551e+01 2.950e+01 1.117e+03, threshold=5.103e+01, percent-clipped=5.0 2024-08-20 06:59:50,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4704400.0, ans=0.125 2024-08-20 07:00:05,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4704500.0, ans=0.125 2024-08-20 07:00:10,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4704500.0, ans=0.2 2024-08-20 07:00:11,339 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.43 vs. limit=15.0 2024-08-20 07:00:23,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4704600.0, ans=0.0 2024-08-20 07:00:57,181 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 11100, loss[loss=0.1096, beats_loss=0.008235, ecapa_loss=0.0001178, whisper_loss=0.1002, over 19535.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0105, ecapa_loss=0.0001409, whisper_loss=0.09097, over 3840744.88 frames. ], batch size: 72, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:00:58,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4704800.0, ans=0.07 2024-08-20 07:01:02,878 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.68 vs. limit=15.0 2024-08-20 07:01:33,224 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 21 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 07:01:40,853 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 20 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-20 07:01:53,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4705000.0, ans=0.125 2024-08-20 07:01:54,192 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2024-08-20 07:02:05,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4705100.0, ans=0.125 2024-08-20 07:02:09,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4705100.0, ans=0.125 2024-08-20 07:02:11,724 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 20 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-20 07:02:13,143 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.692e+00 2024-08-20 07:02:14,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4705200.0, ans=0.125 2024-08-20 07:02:18,911 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.26 vs. limit=15.0 2024-08-20 07:02:27,281 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 28 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 07:02:35,151 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 11150, loss[loss=0.1078, beats_loss=0.009138, ecapa_loss=0.0001374, whisper_loss=0.09732, over 20630.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01047, ecapa_loss=0.0001411, whisper_loss=0.09124, over 3865773.16 frames. ], batch size: 80, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:02:41,434 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 07:02:59,154 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.13 vs. limit=12.0 2024-08-20 07:02:59,645 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.354e+01 2.488e+01 2.886e+01 1.211e+02, threshold=4.976e+01, percent-clipped=2.0 2024-08-20 07:03:14,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4705400.0, ans=0.125 2024-08-20 07:03:26,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4705500.0, ans=0.1 2024-08-20 07:03:31,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4705500.0, ans=0.125 2024-08-20 07:03:35,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4705500.0, ans=0.07 2024-08-20 07:03:35,898 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.67 vs. limit=15.0 2024-08-20 07:03:36,341 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 21 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-20 07:03:49,099 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 07:03:52,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4705600.0, ans=0.125 2024-08-20 07:03:56,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4705600.0, ans=0.1 2024-08-20 07:04:01,766 WARNING [optim.py:496] (3/4) Scaling gradients by 0.09045316278934479, model_norm_threshold=49.755611419677734 2024-08-20 07:04:01,934 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.0.norm.log_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.284e+04, grad_sumsq=3.284e+04, orig_rms_sq=1.000e+00 2024-08-20 07:04:17,681 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 26 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-20 07:04:21,340 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 11200, loss[loss=0.1028, beats_loss=0.01111, ecapa_loss=0.0001541, whisper_loss=0.0902, over 17234.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01053, ecapa_loss=0.0001415, whisper_loss=0.09138, over 3877582.29 frames. ], batch size: 70, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:04:22,883 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.15 vs. limit=22.5 2024-08-20 07:04:25,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4705800.0, ans=0.0 2024-08-20 07:04:49,017 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2024-08-20 07:04:54,701 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2024-08-20 07:05:07,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4706000.0, ans=0.05 2024-08-20 07:05:53,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4706200.0, ans=0.125 2024-08-20 07:06:00,538 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 11250, loss[loss=0.09237, beats_loss=0.01106, ecapa_loss=0.0001103, whisper_loss=0.08021, over 22376.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0105, ecapa_loss=0.0001411, whisper_loss=0.09158, over 3898830.64 frames. ], batch size: 84, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:06:16,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4706300.0, ans=0.0 2024-08-20 07:06:17,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4706300.0, ans=0.0 2024-08-20 07:06:23,081 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.303e+01 2.565e+01 2.928e+01 5.501e+02, threshold=5.130e+01, percent-clipped=1.0 2024-08-20 07:06:26,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4706400.0, ans=0.125 2024-08-20 07:06:45,647 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.39 vs. limit=22.5 2024-08-20 07:06:48,388 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 07:06:50,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4706500.0, ans=0.1 2024-08-20 07:06:57,232 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.17 vs. limit=15.0 2024-08-20 07:07:02,666 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 27 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-20 07:07:03,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4706600.0, ans=0.125 2024-08-20 07:07:03,898 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 25 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-20 07:07:10,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4706600.0, ans=0.0 2024-08-20 07:07:36,912 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 22 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-20 07:07:40,645 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 11300, loss[loss=0.1152, beats_loss=0.01034, ecapa_loss=0.0001616, whisper_loss=0.1032, over 15007.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01048, ecapa_loss=0.000141, whisper_loss=0.09118, over 3880653.22 frames. ], batch size: 60, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:07:45,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4706800.0, ans=0.125 2024-08-20 07:07:46,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4706800.0, ans=0.125 2024-08-20 07:07:50,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4706800.0, ans=0.125 2024-08-20 07:07:50,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4706800.0, ans=0.2 2024-08-20 07:07:52,938 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2024-08-20 07:08:18,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4707000.0, ans=0.0 2024-08-20 07:08:23,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4707000.0, ans=10.0 2024-08-20 07:08:25,081 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 25 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 07:08:42,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4707100.0, ans=0.125 2024-08-20 07:09:14,571 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 07:09:16,182 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 11350, loss[loss=0.1114, beats_loss=0.008904, ecapa_loss=0.0001504, whisper_loss=0.1009, over 21771.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01043, ecapa_loss=0.0001402, whisper_loss=0.09112, over 3856293.04 frames. ], batch size: 89, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:09:20,413 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 20 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-20 07:09:35,232 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 31 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-20 07:09:36,767 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.698e+01 2.220e+01 2.467e+01 2.786e+01 5.186e+01, threshold=4.935e+01, percent-clipped=1.0 2024-08-20 07:09:49,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4707400.0, ans=0.1 2024-08-20 07:09:57,423 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.27 vs. limit=15.0 2024-08-20 07:10:17,530 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.83 vs. limit=10.0 2024-08-20 07:10:22,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4707600.0, ans=0.0 2024-08-20 07:10:35,854 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.60 vs. limit=12.0 2024-08-20 07:10:39,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4707700.0, ans=0.125 2024-08-20 07:10:49,673 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 11400, loss[loss=0.1032, beats_loss=0.01194, ecapa_loss=0.0001335, whisper_loss=0.08991, over 18801.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01042, ecapa_loss=0.0001403, whisper_loss=0.09128, over 3868656.68 frames. ], batch size: 75, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:10:52,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=4707800.0, ans=0.2 2024-08-20 07:10:55,134 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 30 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-20 07:11:00,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4707800.0, ans=0.125 2024-08-20 07:11:01,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4707800.0, ans=0.125 2024-08-20 07:11:12,084 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 07:11:16,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4707900.0, ans=0.125 2024-08-20 07:11:21,348 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 22 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-20 07:12:23,561 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 11450, loss[loss=0.09371, beats_loss=0.01288, ecapa_loss=0.0001377, whisper_loss=0.07945, over 20190.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01041, ecapa_loss=0.0001398, whisper_loss=0.0914, over 3873071.20 frames. ], batch size: 86, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:12:37,500 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-20 07:12:45,496 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.698e+01 2.318e+01 2.506e+01 2.910e+01 4.315e+01, threshold=5.012e+01, percent-clipped=0.0 2024-08-20 07:12:46,369 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.72 vs. limit=15.0 2024-08-20 07:12:52,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4708400.0, ans=0.0 2024-08-20 07:12:54,668 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.86 vs. limit=15.0 2024-08-20 07:12:55,393 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 18 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 07:13:28,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4708600.0, ans=0.125 2024-08-20 07:13:28,449 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.67 vs. limit=10.0 2024-08-20 07:13:51,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4708700.0, ans=0.0 2024-08-20 07:13:58,857 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 11500, loss[loss=0.1153, beats_loss=0.008267, ecapa_loss=0.0001453, whisper_loss=0.1056, over 20579.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01043, ecapa_loss=0.00014, whisper_loss=0.09115, over 3861390.27 frames. ], batch size: 81, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:14:00,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4708800.0, ans=0.125 2024-08-20 07:14:17,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4708900.0, ans=0.125 2024-08-20 07:14:25,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4708900.0, ans=0.125 2024-08-20 07:14:32,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4708900.0, ans=0.1 2024-08-20 07:14:51,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4709000.0, ans=0.2 2024-08-20 07:14:56,413 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 29 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-20 07:15:08,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4709100.0, ans=0.125 2024-08-20 07:15:27,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4709200.0, ans=0.0 2024-08-20 07:15:35,391 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 34 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-20 07:15:39,200 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 11550, loss[loss=0.105, beats_loss=0.009043, ecapa_loss=0.000123, whisper_loss=0.09477, over 18134.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01037, ecapa_loss=0.0001392, whisper_loss=0.09097, over 3863285.40 frames. ], batch size: 67, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:16:01,888 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.654e+01 2.302e+01 2.543e+01 2.812e+01 2.319e+02, threshold=5.086e+01, percent-clipped=3.0 2024-08-20 07:16:30,577 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 22 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-20 07:16:31,569 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.33 vs. limit=22.5 2024-08-20 07:16:38,290 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.12 vs. limit=15.0 2024-08-20 07:16:44,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4709600.0, ans=0.0 2024-08-20 07:16:44,429 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.36 vs. limit=22.5 2024-08-20 07:16:50,161 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.361e+00 2024-08-20 07:17:02,720 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 21 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-20 07:17:05,067 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 25 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-20 07:17:15,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=4709700.0, ans=15.0 2024-08-20 07:17:25,304 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.407e+01 2024-08-20 07:17:25,999 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 11600, loss[loss=0.09486, beats_loss=0.01199, ecapa_loss=0.0001422, whisper_loss=0.08145, over 16652.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01036, ecapa_loss=0.0001386, whisper_loss=0.09084, over 3882987.70 frames. ], batch size: 69, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:17:28,057 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 17 from LS+wenet, 26 from Vox, 20 fro AS 2024-08-20 07:18:14,680 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 20 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 07:18:18,967 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 21 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-20 07:18:28,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4710100.0, ans=0.0 2024-08-20 07:18:31,689 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 34 from LS+wenet, 33 from Vox, 26 fro AS 2024-08-20 07:18:34,041 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-20 07:18:36,606 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 25 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-20 07:18:59,455 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 17 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-20 07:19:06,163 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 29 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-20 07:19:07,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4710200.0, ans=0.0 2024-08-20 07:19:09,678 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 11650, loss[loss=0.0676, beats_loss=0.01197, ecapa_loss=0.0001533, whisper_loss=0.0541, over 16599.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01036, ecapa_loss=0.0001387, whisper_loss=0.09064, over 3866666.21 frames. ], batch size: 71, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:19:23,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4710300.0, ans=0.125 2024-08-20 07:19:33,346 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.347e+01 2.610e+01 2.991e+01 4.037e+01, threshold=5.219e+01, percent-clipped=0.0 2024-08-20 07:19:43,903 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 18 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-20 07:20:08,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4710500.0, ans=0.125 2024-08-20 07:20:09,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4710500.0, ans=0.125 2024-08-20 07:20:51,476 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 11700, loss[loss=0.105, beats_loss=0.008609, ecapa_loss=0.0001489, whisper_loss=0.09486, over 20545.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01041, ecapa_loss=0.0001389, whisper_loss=0.09033, over 3848312.60 frames. ], batch size: 80, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:20:52,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4710800.0, ans=10.0 2024-08-20 07:21:02,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4710800.0, ans=0.125 2024-08-20 07:21:15,359 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.82 vs. limit=15.0 2024-08-20 07:21:20,320 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 18 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 07:21:32,954 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.35 vs. limit=15.0 2024-08-20 07:21:37,723 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 34 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 07:21:57,275 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-20 07:21:59,365 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 20 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-20 07:22:00,661 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.19 vs. limit=15.0 2024-08-20 07:22:04,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4711100.0, ans=0.125 2024-08-20 07:22:32,002 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 11750, loss[loss=0.07492, beats_loss=0.01074, ecapa_loss=0.0001438, whisper_loss=0.06275, over 17794.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01044, ecapa_loss=0.0001386, whisper_loss=0.08981, over 3826137.22 frames. ], batch size: 71, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:22:52,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4711400.0, ans=0.125 2024-08-20 07:22:55,006 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.706e+01 2.306e+01 2.569e+01 2.913e+01 4.079e+01, threshold=5.137e+01, percent-clipped=0.0 2024-08-20 07:22:59,407 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 22 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-20 07:23:05,810 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-20 07:23:28,485 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.07 vs. limit=22.5 2024-08-20 07:24:00,703 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 23 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 07:24:18,349 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 11800, loss[loss=0.1087, beats_loss=0.01053, ecapa_loss=0.0001603, whisper_loss=0.09661, over 21850.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01033, ecapa_loss=0.0001394, whisper_loss=0.0908, over 3800830.19 frames. ], batch size: 88, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:24:22,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4711800.0, ans=0.1 2024-08-20 07:24:25,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4711800.0, ans=0.1 2024-08-20 07:24:32,762 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-20 07:24:50,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4711900.0, ans=0.125 2024-08-20 07:25:02,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4712000.0, ans=0.125 2024-08-20 07:25:14,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4712000.0, ans=0.05 2024-08-20 07:25:21,002 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 07:25:22,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4712100.0, ans=0.125 2024-08-20 07:25:33,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4712100.0, ans=0.0 2024-08-20 07:25:42,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=4712200.0, ans=15.0 2024-08-20 07:26:02,565 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 11850, loss[loss=0.08549, beats_loss=0.01114, ecapa_loss=0.0001119, whisper_loss=0.07323, over 21301.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.0001382, whisper_loss=0.09007, over 3837625.91 frames. ], batch size: 83, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:26:23,264 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.57 vs. limit=12.0 2024-08-20 07:26:26,633 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.335e+01 2.520e+01 2.883e+01 3.441e+02, threshold=5.040e+01, percent-clipped=1.0 2024-08-20 07:26:26,898 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 19 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-20 07:27:13,980 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=15.0 2024-08-20 07:27:15,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4712600.0, ans=0.125 2024-08-20 07:27:30,550 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 17 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 07:27:46,170 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 11900, loss[loss=0.09055, beats_loss=0.01096, ecapa_loss=0.0001157, whisper_loss=0.07843, over 17607.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01044, ecapa_loss=0.000138, whisper_loss=0.09021, over 3810837.01 frames. ], batch size: 70, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:28:08,164 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.06 vs. limit=15.0 2024-08-20 07:28:45,986 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-20 07:28:52,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4713100.0, ans=0.125 2024-08-20 07:28:53,092 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 35 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 07:29:21,515 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 29 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-20 07:29:23,021 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 11950, loss[loss=0.1052, beats_loss=0.008631, ecapa_loss=0.0001509, whisper_loss=0.09501, over 22225.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01046, ecapa_loss=0.0001385, whisper_loss=0.09015, over 3829626.33 frames. ], batch size: 91, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:29:37,131 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 26 from LS+wenet, 6 from Vox, 26 fro AS 2024-08-20 07:29:37,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=4713300.0, ans=0.95 2024-08-20 07:29:43,871 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.385e+01 2.581e+01 2.869e+01 3.753e+01, threshold=5.162e+01, percent-clipped=0.0 2024-08-20 07:29:49,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4713400.0, ans=0.125 2024-08-20 07:29:57,919 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 21 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-20 07:30:04,663 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.00 vs. limit=10.0 2024-08-20 07:30:43,544 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.55 vs. limit=12.0 2024-08-20 07:30:44,088 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 18 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-20 07:30:55,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4713700.0, ans=0.125 2024-08-20 07:30:57,726 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 12000, loss[loss=0.104, beats_loss=0.008974, ecapa_loss=0.0001287, whisper_loss=0.0937, over 17647.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01051, ecapa_loss=0.0001396, whisper_loss=0.09025, over 3836518.13 frames. ], batch size: 70, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:30:57,726 INFO [train_multi_KD3.py:1140] (3/4) Computing validation loss 2024-08-20 07:31:33,265 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.2692, 2.3194, 2.9927, 2.5655, 3.3191, 3.1039, 3.0365, 2.5874], device='cuda:3') 2024-08-20 07:31:33,863 INFO [train_multi_KD3.py:1150] (3/4) Epoch 32, validation on ASR_libri: loss=0.2532, beats_loss=0, ecapa_loss=0.0005087, whisper_loss=0.2481, over 931116.00 frames. 2024-08-20 07:31:56,072 INFO [train_multi_KD3.py:1150] (3/4) Epoch 32, validation on SV_voxceleb1: loss=0.003908, beats_loss=0, ecapa_loss=0.0003908, whisper_loss=0, over 944235.00 frames. 2024-08-20 07:33:38,300 INFO [train_multi_KD3.py:1150] (3/4) Epoch 32, validation on AT_audioset: loss=0.02304, beats_loss=0.02304, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 07:33:38,310 INFO [train_multi_KD3.py:1156] (3/4) Maximum memory allocated so far is 32606MB 2024-08-20 07:33:50,663 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.61 vs. limit=6.0 2024-08-20 07:34:13,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4714000.0, ans=0.125 2024-08-20 07:34:17,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4714000.0, ans=0.125 2024-08-20 07:34:22,108 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 25 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 07:34:28,868 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 27 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-20 07:34:37,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4714100.0, ans=0.125 2024-08-20 07:34:44,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4714100.0, ans=0.2 2024-08-20 07:34:52,335 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.72 vs. limit=15.0 2024-08-20 07:35:06,507 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 12050, loss[loss=0.1048, beats_loss=0.01136, ecapa_loss=0.0001495, whisper_loss=0.09196, over 23235.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01055, ecapa_loss=0.0001396, whisper_loss=0.09044, over 3868220.45 frames. ], batch size: 95, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:35:13,156 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 18 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-20 07:35:25,718 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.706e+01 2.124e+01 2.460e+01 2.784e+01 4.386e+01, threshold=4.920e+01, percent-clipped=0.0 2024-08-20 07:35:36,232 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.44 vs. limit=6.0 2024-08-20 07:35:36,907 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 24 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-20 07:35:43,791 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2024-08-20 07:36:14,341 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 25 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-20 07:36:15,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4714600.0, ans=0.0 2024-08-20 07:36:21,164 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 30 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-20 07:36:27,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4714700.0, ans=0.0 2024-08-20 07:36:34,860 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 26 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-20 07:36:39,075 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 12100, loss[loss=0.1215, beats_loss=0.009251, ecapa_loss=0.0001462, whisper_loss=0.1108, over 20554.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01052, ecapa_loss=0.0001407, whisper_loss=0.0903, over 3856094.58 frames. ], batch size: 79, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:36:44,449 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.96 vs. limit=15.0 2024-08-20 07:36:53,797 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 21 from LS+wenet, 23 from Vox, 50 fro AS 2024-08-20 07:36:56,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4714800.0, ans=0.0 2024-08-20 07:37:07,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4714900.0, ans=0.2 2024-08-20 07:37:11,406 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=22.5 2024-08-20 07:37:30,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4715000.0, ans=0.125 2024-08-20 07:37:47,075 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 07:37:48,068 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 20 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-20 07:37:51,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4715100.0, ans=0.07 2024-08-20 07:37:56,335 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 22 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-20 07:38:01,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4715100.0, ans=0.125 2024-08-20 07:38:09,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4715200.0, ans=0.09899494936611666 2024-08-20 07:38:23,894 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 12150, loss[loss=0.08816, beats_loss=0.01284, ecapa_loss=0.0001271, whisper_loss=0.07405, over 17447.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01047, ecapa_loss=0.0001409, whisper_loss=0.09071, over 3870085.93 frames. ], batch size: 69, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:38:25,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4715300.0, ans=0.1 2024-08-20 07:38:27,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4715300.0, ans=0.125 2024-08-20 07:38:47,679 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.297e+01 2.567e+01 2.958e+01 5.999e+01, threshold=5.133e+01, percent-clipped=2.0 2024-08-20 07:38:51,229 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 19 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-20 07:39:04,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4715500.0, ans=0.07 2024-08-20 07:39:08,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4715500.0, ans=0.1 2024-08-20 07:39:12,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4715500.0, ans=0.0 2024-08-20 07:39:18,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4715500.0, ans=0.1 2024-08-20 07:39:42,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4715700.0, ans=0.125 2024-08-20 07:39:58,938 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 12200, loss[loss=0.0948, beats_loss=0.01021, ecapa_loss=0.0001531, whisper_loss=0.08306, over 20069.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01043, ecapa_loss=0.0001404, whisper_loss=0.09076, over 3818599.55 frames. ], batch size: 85, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:39:59,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4715800.0, ans=0.07 2024-08-20 07:40:21,088 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 36 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-20 07:40:26,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4715900.0, ans=0.125 2024-08-20 07:40:44,956 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 16 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-20 07:40:46,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4716000.0, ans=0.2 2024-08-20 07:40:55,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4716100.0, ans=0.125 2024-08-20 07:41:10,883 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 17 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-20 07:41:16,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4716200.0, ans=0.035 2024-08-20 07:41:21,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=4716200.0, ans=0.95 2024-08-20 07:41:22,373 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 24 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-20 07:41:24,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4716300.0, ans=0.125 2024-08-20 07:41:25,580 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 12250, loss[loss=0.1118, beats_loss=0.008462, ecapa_loss=0.0001372, whisper_loss=0.102, over 19980.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01041, ecapa_loss=0.0001395, whisper_loss=0.09087, over 3837617.40 frames. ], batch size: 77, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:41:32,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4716300.0, ans=0.5 2024-08-20 07:41:45,067 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.278e+01 2.601e+01 2.929e+01 4.392e+01, threshold=5.202e+01, percent-clipped=0.0 2024-08-20 07:41:52,150 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.47 vs. limit=12.0 2024-08-20 07:42:00,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4716500.0, ans=0.2 2024-08-20 07:42:03,796 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 32 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 07:42:09,260 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 19 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-20 07:42:24,607 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 23 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-20 07:42:51,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4716700.0, ans=0.125 2024-08-20 07:42:53,967 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 12300, loss[loss=0.108, beats_loss=0.009487, ecapa_loss=0.0001585, whisper_loss=0.09696, over 21503.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01037, ecapa_loss=0.0001399, whisper_loss=0.09051, over 3820520.25 frames. ], batch size: 89, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:42:54,144 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 30 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-20 07:42:58,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4716800.0, ans=0.125 2024-08-20 07:43:03,371 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 19 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-20 07:43:07,098 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.37 vs. limit=22.5 2024-08-20 07:43:32,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4717000.0, ans=0.07 2024-08-20 07:43:32,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4717000.0, ans=0.125 2024-08-20 07:43:39,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4717000.0, ans=0.0 2024-08-20 07:43:45,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4717100.0, ans=0.5 2024-08-20 07:43:49,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4717100.0, ans=0.125 2024-08-20 07:44:00,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4717100.0, ans=0.025 2024-08-20 07:44:12,644 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 07:44:23,199 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 12350, loss[loss=0.1051, beats_loss=0.01121, ecapa_loss=0.0001325, whisper_loss=0.09252, over 21553.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01035, ecapa_loss=0.0001394, whisper_loss=0.09088, over 3837339.54 frames. ], batch size: 84, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:44:44,251 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.339e+01 2.566e+01 2.988e+01 1.086e+02, threshold=5.133e+01, percent-clipped=1.0 2024-08-20 07:44:44,445 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 23 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 07:44:47,658 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-20 07:44:50,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4717400.0, ans=0.0 2024-08-20 07:44:59,232 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 28 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-20 07:45:44,148 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 07:45:55,828 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 12400, loss[loss=0.1106, beats_loss=0.00855, ecapa_loss=0.0001516, whisper_loss=0.1006, over 18883.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01041, ecapa_loss=0.0001392, whisper_loss=0.0905, over 3851631.99 frames. ], batch size: 77, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:46:17,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4717900.0, ans=0.125 2024-08-20 07:46:29,883 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2024-08-20 07:46:50,790 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.51 vs. limit=15.0 2024-08-20 07:47:01,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4718100.0, ans=0.125 2024-08-20 07:47:23,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4718300.0, ans=0.125 2024-08-20 07:47:24,634 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 12450, loss[loss=0.09993, beats_loss=0.01232, ecapa_loss=0.0001303, whisper_loss=0.08631, over 20765.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01046, ecapa_loss=0.0001386, whisper_loss=0.08949, over 3823029.33 frames. ], batch size: 81, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:47:35,083 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 24 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 07:47:43,888 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.285e+01 2.465e+01 2.724e+01 4.543e+01, threshold=4.931e+01, percent-clipped=0.0 2024-08-20 07:47:46,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4718400.0, ans=0.125 2024-08-20 07:48:23,268 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 31 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 07:48:24,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4718600.0, ans=0.1 2024-08-20 07:48:41,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4718700.0, ans=0.1 2024-08-20 07:48:44,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4718700.0, ans=0.2 2024-08-20 07:48:52,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4718700.0, ans=0.1 2024-08-20 07:48:58,442 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 12500, loss[loss=0.111, beats_loss=0.01011, ecapa_loss=0.0001345, whisper_loss=0.0995, over 24196.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.000138, whisper_loss=0.08962, over 3863511.91 frames. ], batch size: 92, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:49:06,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4718800.0, ans=0.2 2024-08-20 07:49:23,679 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 16 from LS+wenet, 22 from Vox, 15 fro AS 2024-08-20 07:49:27,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4718900.0, ans=0.0 2024-08-20 07:49:39,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4719000.0, ans=0.1 2024-08-20 07:49:57,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4719100.0, ans=0.0 2024-08-20 07:50:20,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4719200.0, ans=0.125 2024-08-20 07:50:28,511 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 12550, loss[loss=0.1009, beats_loss=0.01185, ecapa_loss=0.0001306, whisper_loss=0.08773, over 15540.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01048, ecapa_loss=0.0001382, whisper_loss=0.0898, over 3833755.71 frames. ], batch size: 60, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:50:39,146 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 22 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-20 07:50:44,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4719300.0, ans=0.1 2024-08-20 07:50:48,468 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.246e+01 2.517e+01 2.953e+01 4.531e+01, threshold=5.033e+01, percent-clipped=0.0 2024-08-20 07:50:56,985 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 18 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-20 07:51:00,250 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 23 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-20 07:51:06,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4719500.0, ans=0.0 2024-08-20 07:51:24,886 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 23 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 07:51:27,674 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.01 vs. limit=15.0 2024-08-20 07:51:34,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4719600.0, ans=0.0 2024-08-20 07:51:39,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4719700.0, ans=0.1 2024-08-20 07:51:41,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4719700.0, ans=0.0 2024-08-20 07:51:55,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=4719700.0, ans=0.1 2024-08-20 07:51:59,663 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 12600, loss[loss=0.1035, beats_loss=0.009845, ecapa_loss=0.0001385, whisper_loss=0.09229, over 22128.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01043, ecapa_loss=0.000139, whisper_loss=0.0903, over 3811913.95 frames. ], batch size: 92, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:52:24,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4719900.0, ans=0.95 2024-08-20 07:52:41,444 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 21 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-20 07:53:05,140 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 25 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-20 07:53:33,461 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 12650, loss[loss=0.09974, beats_loss=0.01175, ecapa_loss=0.000128, whisper_loss=0.08671, over 18680.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01038, ecapa_loss=0.0001393, whisper_loss=0.09035, over 3804150.10 frames. ], batch size: 72, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:53:36,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4720300.0, ans=0.1 2024-08-20 07:53:50,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4720400.0, ans=0.0 2024-08-20 07:53:53,082 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.05 vs. limit=10.0 2024-08-20 07:53:53,478 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.384e+01 2.676e+01 2.977e+01 1.190e+02, threshold=5.353e+01, percent-clipped=5.0 2024-08-20 07:53:57,250 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 24 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 07:53:58,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4720400.0, ans=0.2 2024-08-20 07:54:02,023 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.94 vs. limit=22.5 2024-08-20 07:54:12,933 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.04 vs. limit=22.5 2024-08-20 07:54:20,868 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 22 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 07:54:34,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4720600.0, ans=0.1 2024-08-20 07:54:51,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4720700.0, ans=0.0 2024-08-20 07:54:53,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4720700.0, ans=0.125 2024-08-20 07:55:02,709 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 12700, loss[loss=0.1012, beats_loss=0.009734, ecapa_loss=0.0001238, whisper_loss=0.09019, over 13759.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01042, ecapa_loss=0.0001385, whisper_loss=0.08959, over 3814754.40 frames. ], batch size: 51, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:55:05,629 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.16 vs. limit=22.5 2024-08-20 07:55:12,112 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 16 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-20 07:55:13,510 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-20 07:55:20,213 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 34 from LS+wenet, 12 from Vox, 45 fro AS 2024-08-20 07:55:23,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4720900.0, ans=0.2 2024-08-20 07:55:46,875 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.306e-01 2024-08-20 07:55:47,753 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 24 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 07:55:53,066 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.65 vs. limit=15.0 2024-08-20 07:55:58,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4721100.0, ans=0.1 2024-08-20 07:56:13,843 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.119e+00 2024-08-20 07:56:22,959 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.13 vs. limit=22.5 2024-08-20 07:56:28,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4721200.0, ans=0.0 2024-08-20 07:56:29,115 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 23 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 07:56:33,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4721300.0, ans=0.125 2024-08-20 07:56:34,117 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 12750, loss[loss=0.1074, beats_loss=0.009233, ecapa_loss=0.0001568, whisper_loss=0.09655, over 23215.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01033, ecapa_loss=0.0001391, whisper_loss=0.09022, over 3818401.46 frames. ], batch size: 92, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:56:37,438 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.82 vs. limit=22.5 2024-08-20 07:56:49,948 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 22 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-20 07:56:52,949 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.290e+01 2.492e+01 2.698e+01 4.644e+01, threshold=4.984e+01, percent-clipped=0.0 2024-08-20 07:56:57,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4721400.0, ans=0.125 2024-08-20 07:57:45,411 INFO [train_multi_KD3.py:845] (3/4) A total of 49 cuts. 16 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-20 07:57:56,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4721700.0, ans=0.0 2024-08-20 07:58:02,313 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 12800, loss[loss=0.1107, beats_loss=0.009982, ecapa_loss=0.00016, whisper_loss=0.09911, over 22309.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01036, ecapa_loss=0.0001392, whisper_loss=0.09049, over 3844454.63 frames. ], batch size: 89, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:58:06,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4721800.0, ans=0.2 2024-08-20 07:58:08,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4721800.0, ans=0.125 2024-08-20 07:58:31,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4721900.0, ans=0.1 2024-08-20 07:59:11,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4722100.0, ans=0.09899494936611666 2024-08-20 07:59:25,717 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 37 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 07:59:28,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4722200.0, ans=0.125 2024-08-20 07:59:33,818 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.08 vs. limit=12.0 2024-08-20 07:59:36,533 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 12850, loss[loss=0.06781, beats_loss=0.01244, ecapa_loss=0.0001466, whisper_loss=0.0539, over 18735.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01045, ecapa_loss=0.0001385, whisper_loss=0.08994, over 3851986.50 frames. ], batch size: 77, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:59:36,766 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 12 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 07:59:56,946 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.213e+01 2.468e+01 2.785e+01 8.515e+01, threshold=4.935e+01, percent-clipped=2.0 2024-08-20 08:00:00,489 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 15 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-20 08:00:15,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4722500.0, ans=0.0 2024-08-20 08:00:22,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4722500.0, ans=0.0 2024-08-20 08:00:30,601 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 16 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 08:00:41,307 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 08:00:56,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4722700.0, ans=0.125 2024-08-20 08:00:58,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4722700.0, ans=0.0 2024-08-20 08:01:04,946 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 12900, loss[loss=0.1075, beats_loss=0.00857, ecapa_loss=0.0001212, whisper_loss=0.09773, over 14787.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01045, ecapa_loss=0.0001388, whisper_loss=0.08985, over 3810749.94 frames. ], batch size: 55, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:01:27,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4722900.0, ans=0.1 2024-08-20 08:01:50,849 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=15.0 2024-08-20 08:02:08,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4723100.0, ans=0.125 2024-08-20 08:02:10,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4723100.0, ans=0.125 2024-08-20 08:02:19,048 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 32 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-20 08:02:23,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4723200.0, ans=0.125 2024-08-20 08:02:23,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4723200.0, ans=0.1 2024-08-20 08:02:24,970 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 30 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-20 08:02:25,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4723200.0, ans=0.125 2024-08-20 08:02:35,026 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 12950, loss[loss=0.09933, beats_loss=0.009776, ecapa_loss=0.0001542, whisper_loss=0.08801, over 21924.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01046, ecapa_loss=0.0001393, whisper_loss=0.0905, over 3807930.64 frames. ], batch size: 90, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:02:56,159 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.320e+01 2.461e+01 2.898e+01 1.890e+02, threshold=4.922e+01, percent-clipped=4.0 2024-08-20 08:02:56,334 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 23 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-20 08:03:16,092 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.26 vs. limit=10.0 2024-08-20 08:03:20,860 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.00 vs. limit=15.0 2024-08-20 08:03:26,660 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.296e+01 2024-08-20 08:03:27,544 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 16 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 08:03:33,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4723600.0, ans=0.0 2024-08-20 08:03:36,552 WARNING [optim.py:496] (3/4) Scaling gradients by 0.012246229685842991, model_norm_threshold=49.221561431884766 2024-08-20 08:03:36,719 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.214e+06, grad_sumsq=3.009e+08, orig_rms_sq=1.068e-02 2024-08-20 08:04:02,139 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 22 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-20 08:04:07,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4723800.0, ans=0.0 2024-08-20 08:04:07,926 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 13000, loss[loss=0.1124, beats_loss=0.009504, ecapa_loss=0.0001535, whisper_loss=0.1013, over 19896.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01054, ecapa_loss=0.0001391, whisper_loss=0.08972, over 3771624.13 frames. ], batch size: 81, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:04:44,155 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 20 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 08:05:08,418 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.98 vs. limit=15.0 2024-08-20 08:05:15,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4724100.0, ans=0.0 2024-08-20 08:05:16,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4724100.0, ans=0.125 2024-08-20 08:05:19,442 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 24 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-20 08:05:41,810 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 13050, loss[loss=0.1224, beats_loss=0.007364, ecapa_loss=0.0001369, whisper_loss=0.1137, over 19146.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01051, ecapa_loss=0.0001401, whisper_loss=0.09035, over 3792656.96 frames. ], batch size: 72, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:05:45,791 WARNING [optim.py:496] (3/4) Scaling gradients by 0.03136618062853813, model_norm_threshold=49.221561431884766 2024-08-20 08:05:45,958 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.19, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.784e+05, grad_sumsq=1.450e+05, orig_rms_sq=3.300e+00 2024-08-20 08:06:03,427 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.318e+01 2.513e+01 2.841e+01 4.019e+03, threshold=5.026e+01, percent-clipped=3.0 2024-08-20 08:06:13,823 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 24 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-20 08:06:25,725 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 23 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 08:06:32,881 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.62 vs. limit=22.5 2024-08-20 08:06:38,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4724500.0, ans=0.0 2024-08-20 08:06:40,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4724600.0, ans=0.0 2024-08-20 08:06:50,971 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 24 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-20 08:07:00,048 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=12.0 2024-08-20 08:07:04,927 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.56 vs. limit=15.0 2024-08-20 08:07:21,155 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 13100, loss[loss=0.113, beats_loss=0.007991, ecapa_loss=0.0001554, whisper_loss=0.1034, over 21337.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01053, ecapa_loss=0.0001409, whisper_loss=0.08988, over 3829285.63 frames. ], batch size: 82, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:07:31,077 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.57 vs. limit=15.0 2024-08-20 08:07:36,951 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 23 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 08:07:47,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4724900.0, ans=0.2 2024-08-20 08:08:26,176 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 35 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-20 08:08:26,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4725100.0, ans=0.0 2024-08-20 08:08:28,895 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.38 vs. limit=22.5 2024-08-20 08:08:35,087 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 29 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-20 08:08:40,613 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 08:08:54,641 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 13150, loss[loss=0.1131, beats_loss=0.009953, ecapa_loss=0.00013, whisper_loss=0.1018, over 23408.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01052, ecapa_loss=0.0001409, whisper_loss=0.08988, over 3809453.14 frames. ], batch size: 91, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:09:15,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4725400.0, ans=0.0 2024-08-20 08:09:16,315 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.495e+01 2.265e+01 2.500e+01 2.860e+01 8.543e+01, threshold=5.000e+01, percent-clipped=2.0 2024-08-20 08:09:16,511 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 30 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-20 08:09:23,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4725400.0, ans=0.125 2024-08-20 08:09:29,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4725400.0, ans=0.1 2024-08-20 08:09:42,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4725500.0, ans=0.0 2024-08-20 08:09:43,671 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 12 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-20 08:09:47,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4725500.0, ans=0.125 2024-08-20 08:09:47,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4725500.0, ans=0.1 2024-08-20 08:09:55,452 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 33 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 08:10:08,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4725600.0, ans=0.2 2024-08-20 08:10:27,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4725700.0, ans=0.0 2024-08-20 08:10:29,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4725700.0, ans=0.0 2024-08-20 08:10:31,889 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 13200, loss[loss=0.09119, beats_loss=0.01052, ecapa_loss=0.0001577, whisper_loss=0.07909, over 17954.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01047, ecapa_loss=0.00014, whisper_loss=0.08982, over 3786968.37 frames. ], batch size: 75, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:10:48,406 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 29 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-20 08:10:58,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4725900.0, ans=0.1 2024-08-20 08:11:01,215 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 08:11:02,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4725900.0, ans=0.125 2024-08-20 08:11:27,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4726100.0, ans=0.125 2024-08-20 08:11:47,674 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 29 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 08:11:51,199 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 29 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-20 08:12:03,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4726200.0, ans=0.125 2024-08-20 08:12:05,813 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 13250, loss[loss=0.09765, beats_loss=0.01243, ecapa_loss=0.0001107, whisper_loss=0.08411, over 16224.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01046, ecapa_loss=0.0001407, whisper_loss=0.08976, over 3817985.58 frames. ], batch size: 63, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:12:15,488 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 30 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-20 08:12:19,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4726300.0, ans=0.125 2024-08-20 08:12:24,692 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 21 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-20 08:12:26,221 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.372e+01 2.599e+01 3.015e+01 7.004e+01, threshold=5.197e+01, percent-clipped=3.0 2024-08-20 08:12:43,239 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.10 vs. limit=15.0 2024-08-20 08:12:48,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4726500.0, ans=0.0 2024-08-20 08:12:52,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4726500.0, ans=0.0 2024-08-20 08:13:00,274 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.116e+05 2024-08-20 08:13:02,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4726600.0, ans=0.125 2024-08-20 08:13:17,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4726600.0, ans=0.1 2024-08-20 08:13:40,492 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 13300, loss[loss=0.11, beats_loss=0.009986, ecapa_loss=0.000139, whisper_loss=0.09864, over 23208.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01046, ecapa_loss=0.0001414, whisper_loss=0.08937, over 3810836.79 frames. ], batch size: 91, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:13:46,413 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 20 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 08:13:49,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4726800.0, ans=0.0 2024-08-20 08:13:53,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4726800.0, ans=0.125 2024-08-20 08:13:54,574 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-20 08:14:00,672 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.26 vs. limit=22.5 2024-08-20 08:14:08,034 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 08:14:18,812 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 25 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-20 08:14:43,247 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 32 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-20 08:14:52,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4727100.0, ans=0.0 2024-08-20 08:14:52,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4727100.0, ans=0.05 2024-08-20 08:14:54,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4727200.0, ans=0.125 2024-08-20 08:14:58,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4727200.0, ans=0.2 2024-08-20 08:15:05,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4727200.0, ans=0.025 2024-08-20 08:15:07,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4727200.0, ans=0.0 2024-08-20 08:15:14,037 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 13350, loss[loss=0.112, beats_loss=0.008635, ecapa_loss=0.0001429, whisper_loss=0.102, over 12879.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01045, ecapa_loss=0.0001416, whisper_loss=0.0897, over 3804083.66 frames. ], batch size: 50, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:15:17,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4727300.0, ans=0.1 2024-08-20 08:15:34,091 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.328e+01 2.528e+01 2.746e+01 3.166e+02, threshold=5.056e+01, percent-clipped=2.0 2024-08-20 08:15:46,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4727400.0, ans=0.125 2024-08-20 08:15:51,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4727500.0, ans=0.1 2024-08-20 08:15:57,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4727500.0, ans=0.2 2024-08-20 08:16:09,091 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 22 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-20 08:16:25,905 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 28 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 08:16:36,202 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.83 vs. limit=15.0 2024-08-20 08:16:37,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4727700.0, ans=0.1 2024-08-20 08:16:46,255 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 13400, loss[loss=0.09757, beats_loss=0.009603, ecapa_loss=0.0001711, whisper_loss=0.08626, over 20389.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01039, ecapa_loss=0.0001414, whisper_loss=0.08985, over 3799021.60 frames. ], batch size: 85, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:16:55,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4727800.0, ans=0.0 2024-08-20 08:17:08,932 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-20 08:17:35,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4728000.0, ans=0.1 2024-08-20 08:18:04,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4728200.0, ans=0.125 2024-08-20 08:18:13,923 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 25 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 08:18:17,774 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 13450, loss[loss=0.1173, beats_loss=0.004911, ecapa_loss=0.0002001, whisper_loss=0.1104, over 13353.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01031, ecapa_loss=0.0001415, whisper_loss=0.09047, over 3781886.75 frames. ], batch size: 52, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:18:36,772 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2024-08-20 08:18:39,119 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+01 2.294e+01 2.576e+01 2.808e+01 3.727e+01, threshold=5.153e+01, percent-clipped=0.0 2024-08-20 08:18:41,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4728400.0, ans=0.125 2024-08-20 08:18:42,844 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 08:18:46,204 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 19 from LS+wenet, 8 from Vox, 30 fro AS 2024-08-20 08:18:59,198 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 21 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-20 08:19:01,316 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 32 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-20 08:19:09,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4728500.0, ans=0.0 2024-08-20 08:19:15,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4728600.0, ans=0.0 2024-08-20 08:19:43,519 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.86 vs. limit=22.5 2024-08-20 08:19:51,644 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 13500, loss[loss=0.108, beats_loss=0.008739, ecapa_loss=0.0001254, whisper_loss=0.098, over 23124.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01039, ecapa_loss=0.0001399, whisper_loss=0.08997, over 3776024.40 frames. ], batch size: 91, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:19:53,944 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 28 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 08:20:03,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4728800.0, ans=0.125 2024-08-20 08:20:11,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4728900.0, ans=0.1 2024-08-20 08:20:25,854 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 22 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-20 08:20:28,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4729000.0, ans=0.1 2024-08-20 08:20:47,070 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 08:20:47,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4729100.0, ans=0.125 2024-08-20 08:21:31,024 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 13550, loss[loss=0.1078, beats_loss=0.008222, ecapa_loss=0.0001512, whisper_loss=0.09812, over 15771.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01029, ecapa_loss=0.0001396, whisper_loss=0.09036, over 3777871.31 frames. ], batch size: 63, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:21:38,595 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-20 08:21:52,018 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.686e+01 2.276e+01 2.469e+01 2.814e+01 4.223e+01, threshold=4.938e+01, percent-clipped=0.0 2024-08-20 08:21:53,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4729400.0, ans=0.015 2024-08-20 08:21:58,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4729400.0, ans=0.1 2024-08-20 08:22:20,154 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 12 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-20 08:22:24,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4729500.0, ans=0.2 2024-08-20 08:22:46,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4729700.0, ans=0.025 2024-08-20 08:23:05,608 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 13600, loss[loss=0.0999, beats_loss=0.01164, ecapa_loss=0.0001447, whisper_loss=0.08681, over 18229.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01027, ecapa_loss=0.00014, whisper_loss=0.09098, over 3763044.22 frames. ], batch size: 74, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:23:16,745 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.45 vs. limit=10.0 2024-08-20 08:23:19,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4729800.0, ans=0.0 2024-08-20 08:23:41,668 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.265e+05 2024-08-20 08:24:22,060 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.11 vs. limit=10.0 2024-08-20 08:24:38,778 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.03 vs. limit=15.0 2024-08-20 08:24:43,165 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 13650, loss[loss=0.0997, beats_loss=0.008062, ecapa_loss=0.0002068, whisper_loss=0.08957, over 19130.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0103, ecapa_loss=0.0001399, whisper_loss=0.09086, over 3790457.05 frames. ], batch size: 81, lr: 1.90e-03, grad_scale: 1.152921504606847e+18 2024-08-20 08:24:43,408 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 18 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-20 08:24:53,212 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 24 from LS+wenet, 13 from Vox, 14 fro AS 2024-08-20 08:25:01,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4730400.0, ans=0.2 2024-08-20 08:25:03,843 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.366e+01 2.588e+01 2.806e+01 4.523e+02, threshold=5.175e+01, percent-clipped=1.0 2024-08-20 08:25:08,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4730400.0, ans=0.125 2024-08-20 08:25:26,831 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.09 vs. limit=15.0 2024-08-20 08:25:31,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4730500.0, ans=0.1 2024-08-20 08:25:41,987 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 17 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-20 08:25:54,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4730600.0, ans=10.0 2024-08-20 08:25:57,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=4730700.0, ans=0.02 2024-08-20 08:26:01,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4730700.0, ans=0.0 2024-08-20 08:26:05,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4730700.0, ans=0.1 2024-08-20 08:26:15,101 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 13700, loss[loss=0.1162, beats_loss=0.008781, ecapa_loss=0.0001664, whisper_loss=0.1057, over 17022.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01034, ecapa_loss=0.0001398, whisper_loss=0.09033, over 3730336.61 frames. ], batch size: 67, lr: 1.90e-03, grad_scale: 1.152921504606847e+18 2024-08-20 08:26:15,254 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 18 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-20 08:26:15,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4730800.0, ans=0.125 2024-08-20 08:26:22,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4730800.0, ans=0.0 2024-08-20 08:26:45,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4730900.0, ans=0.0 2024-08-20 08:27:18,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4731100.0, ans=0.125 2024-08-20 08:27:30,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4731200.0, ans=0.0 2024-08-20 08:27:48,703 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 13750, loss[loss=0.1049, beats_loss=0.01083, ecapa_loss=0.0001077, whisper_loss=0.09297, over 23308.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01042, ecapa_loss=0.0001384, whisper_loss=0.09032, over 3772457.75 frames. ], batch size: 90, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:27:54,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4731300.0, ans=0.2 2024-08-20 08:28:10,306 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.242e+01 2.511e+01 2.832e+01 4.850e+01, threshold=5.022e+01, percent-clipped=0.0 2024-08-20 08:28:12,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4731400.0, ans=0.04949747468305833 2024-08-20 08:28:17,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4731400.0, ans=0.125 2024-08-20 08:28:19,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4731400.0, ans=0.125 2024-08-20 08:28:22,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4731400.0, ans=0.1 2024-08-20 08:28:24,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4731500.0, ans=0.125 2024-08-20 08:28:45,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4731600.0, ans=0.125 2024-08-20 08:28:48,531 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 24 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-20 08:29:02,729 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-20 08:29:07,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4731700.0, ans=0.04949747468305833 2024-08-20 08:29:11,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4731700.0, ans=0.0 2024-08-20 08:29:22,251 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 13800, loss[loss=0.09851, beats_loss=0.01281, ecapa_loss=0.000125, whisper_loss=0.08445, over 17044.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01047, ecapa_loss=0.0001386, whisper_loss=0.08924, over 3750680.23 frames. ], batch size: 71, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:29:23,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4731800.0, ans=0.125 2024-08-20 08:29:31,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4731800.0, ans=0.0 2024-08-20 08:29:31,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4731800.0, ans=0.025 2024-08-20 08:29:35,049 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 08:29:37,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4731800.0, ans=0.125 2024-08-20 08:29:50,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4731900.0, ans=0.125 2024-08-20 08:29:58,823 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 11 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-20 08:30:09,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4732000.0, ans=0.0 2024-08-20 08:30:14,113 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 25 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-20 08:30:16,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4732000.0, ans=0.1 2024-08-20 08:30:32,680 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.09 vs. limit=22.5 2024-08-20 08:30:39,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4732200.0, ans=0.125 2024-08-20 08:30:54,836 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 13850, loss[loss=0.1069, beats_loss=0.01174, ecapa_loss=0.0001191, whisper_loss=0.09399, over 18700.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001391, whisper_loss=0.08964, over 3777971.55 frames. ], batch size: 72, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:31:15,857 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.631e+01 2.257e+01 2.391e+01 2.623e+01 3.979e+01, threshold=4.782e+01, percent-clipped=0.0 2024-08-20 08:31:27,029 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 25 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-20 08:31:39,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4732500.0, ans=0.125 2024-08-20 08:32:01,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4732600.0, ans=0.0 2024-08-20 08:32:26,084 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 13900, loss[loss=0.0862, beats_loss=0.01243, ecapa_loss=0.0001092, whisper_loss=0.07268, over 22729.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01052, ecapa_loss=0.0001386, whisper_loss=0.08961, over 3744592.06 frames. ], batch size: 92, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:32:37,540 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.06 vs. limit=22.5 2024-08-20 08:32:41,975 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 26 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-20 08:32:56,032 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 22 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-20 08:33:11,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4733000.0, ans=0.2 2024-08-20 08:33:26,839 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 22 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-20 08:33:37,372 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.01 vs. limit=22.5 2024-08-20 08:33:40,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4733200.0, ans=0.125 2024-08-20 08:33:42,220 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 27 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-20 08:33:54,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4733300.0, ans=0.125 2024-08-20 08:33:55,668 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 13950, loss[loss=0.1135, beats_loss=0.0104, ecapa_loss=0.0001299, whisper_loss=0.1018, over 22566.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01057, ecapa_loss=0.0001394, whisper_loss=0.08949, over 3763317.32 frames. ], batch size: 87, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:34:01,980 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 20 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 08:34:05,719 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 24 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-20 08:34:17,740 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.254e+01 2.490e+01 2.803e+01 3.566e+01, threshold=4.981e+01, percent-clipped=0.0 2024-08-20 08:34:23,635 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-20 08:34:56,277 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 17 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-20 08:35:08,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4733700.0, ans=0.1 2024-08-20 08:35:15,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4733700.0, ans=0.125 2024-08-20 08:35:16,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4733700.0, ans=0.125 2024-08-20 08:35:26,935 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 14000, loss[loss=0.1141, beats_loss=0.01162, ecapa_loss=0.0001045, whisper_loss=0.1014, over 18248.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01055, ecapa_loss=0.0001396, whisper_loss=0.08977, over 3782473.31 frames. ], batch size: 70, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:35:50,057 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 08:35:53,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4733900.0, ans=0.07 2024-08-20 08:36:25,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4734100.0, ans=0.125 2024-08-20 08:36:33,787 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 20 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-20 08:36:40,144 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.61 vs. limit=15.0 2024-08-20 08:36:56,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4734200.0, ans=0.0 2024-08-20 08:37:00,708 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 14050, loss[loss=0.126, beats_loss=0.007835, ecapa_loss=0.0001399, whisper_loss=0.1168, over 18069.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0105, ecapa_loss=0.0001395, whisper_loss=0.08976, over 3749400.05 frames. ], batch size: 69, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:37:22,480 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.283e+01 2.494e+01 2.922e+01 5.594e+01, threshold=4.987e+01, percent-clipped=2.0 2024-08-20 08:37:32,236 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 21 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-20 08:37:44,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4734500.0, ans=0.0 2024-08-20 08:37:57,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4734600.0, ans=0.1 2024-08-20 08:38:19,027 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.89 vs. limit=15.0 2024-08-20 08:38:27,133 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 08:38:31,627 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 14100, loss[loss=0.09628, beats_loss=0.01039, ecapa_loss=0.0001328, whisper_loss=0.08456, over 18388.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01049, ecapa_loss=0.0001395, whisper_loss=0.08961, over 3743287.07 frames. ], batch size: 73, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:38:49,334 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-20 08:38:52,410 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.46 vs. limit=22.5 2024-08-20 08:38:59,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4734900.0, ans=0.1 2024-08-20 08:39:11,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4735000.0, ans=0.125 2024-08-20 08:39:14,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4735000.0, ans=0.0 2024-08-20 08:39:22,109 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 08:39:44,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4735200.0, ans=0.0 2024-08-20 08:39:44,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4735200.0, ans=0.0 2024-08-20 08:39:59,283 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 08:40:01,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4735300.0, ans=0.2 2024-08-20 08:40:01,887 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 14150, loss[loss=0.117, beats_loss=0.008469, ecapa_loss=0.0001785, whisper_loss=0.1068, over 21988.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01047, ecapa_loss=0.0001402, whisper_loss=0.0899, over 3773008.59 frames. ], batch size: 88, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:40:15,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4735300.0, ans=0.1 2024-08-20 08:40:22,948 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.235e+01 2.465e+01 2.825e+01 7.434e+01, threshold=4.929e+01, percent-clipped=1.0 2024-08-20 08:40:40,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4735500.0, ans=0.0 2024-08-20 08:40:56,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4735600.0, ans=0.0 2024-08-20 08:41:10,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4735600.0, ans=0.0 2024-08-20 08:41:10,869 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.03 vs. limit=22.5 2024-08-20 08:41:12,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4735700.0, ans=0.0 2024-08-20 08:41:19,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4735700.0, ans=0.125 2024-08-20 08:41:26,956 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 15 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-20 08:41:30,426 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 14200, loss[loss=0.1186, beats_loss=0.009148, ecapa_loss=0.0001457, whisper_loss=0.108, over 18768.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01045, ecapa_loss=0.0001398, whisper_loss=0.09029, over 3746666.99 frames. ], batch size: 73, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:41:47,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4735900.0, ans=0.2 2024-08-20 08:42:19,601 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 28 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-20 08:42:32,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4736100.0, ans=0.1 2024-08-20 08:42:33,808 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 21 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 08:42:39,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4736100.0, ans=0.125 2024-08-20 08:42:44,411 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-20 08:43:02,147 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 14250, loss[loss=0.1079, beats_loss=0.01101, ecapa_loss=0.0001209, whisper_loss=0.09566, over 21952.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0103, ecapa_loss=0.0001403, whisper_loss=0.09131, over 3731980.49 frames. ], batch size: 86, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:43:24,168 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.313e+01 2.520e+01 2.754e+01 4.470e+01, threshold=5.041e+01, percent-clipped=0.0 2024-08-20 08:43:48,146 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 25 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-20 08:44:17,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4736700.0, ans=0.0 2024-08-20 08:44:35,092 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 14300, loss[loss=0.1143, beats_loss=0.009872, ecapa_loss=0.0001136, whisper_loss=0.1033, over 23694.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01027, ecapa_loss=0.0001406, whisper_loss=0.09118, over 3751152.89 frames. ], batch size: 91, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:44:58,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4736900.0, ans=0.125 2024-08-20 08:45:10,537 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.96 vs. limit=22.5 2024-08-20 08:45:16,059 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.11 vs. limit=22.5 2024-08-20 08:45:20,638 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 14 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 08:45:36,195 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 24 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-20 08:45:53,078 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-08-20 08:46:05,498 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 14350, loss[loss=0.1257, beats_loss=0.008078, ecapa_loss=0.0001159, whisper_loss=0.1165, over 23959.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01035, ecapa_loss=0.0001408, whisper_loss=0.09077, over 3762189.61 frames. ], batch size: 87, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:46:09,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4737300.0, ans=0.125 2024-08-20 08:46:23,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4737400.0, ans=0.0 2024-08-20 08:46:24,965 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-20 08:46:26,334 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.359e+01 2.648e+01 3.006e+01 2.772e+02, threshold=5.296e+01, percent-clipped=2.0 2024-08-20 08:47:04,594 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.03 vs. limit=15.0 2024-08-20 08:47:15,677 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.54 vs. limit=15.0 2024-08-20 08:47:32,703 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 14400, loss[loss=0.09876, beats_loss=0.0112, ecapa_loss=0.0001335, whisper_loss=0.08622, over 21567.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01033, ecapa_loss=0.0001408, whisper_loss=0.09002, over 3765577.78 frames. ], batch size: 86, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:47:55,598 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 17 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-20 08:47:58,251 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 25 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-20 08:48:31,555 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 27 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-20 08:48:32,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4738100.0, ans=0.125 2024-08-20 08:48:58,123 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 29 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-20 08:49:03,617 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 14450, loss[loss=0.07132, beats_loss=0.0125, ecapa_loss=0.0001332, whisper_loss=0.05749, over 23429.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01033, ecapa_loss=0.0001412, whisper_loss=0.08977, over 3754258.47 frames. ], batch size: 96, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:49:12,128 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.11 vs. limit=6.0 2024-08-20 08:49:17,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=4738300.0, ans=0.05 2024-08-20 08:49:24,728 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.293e+01 2.479e+01 2.732e+01 7.579e+01, threshold=4.957e+01, percent-clipped=1.0 2024-08-20 08:49:53,481 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 29 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-20 08:50:07,094 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 27 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-20 08:50:08,709 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 23 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-20 08:50:24,670 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 19 from LS+wenet, 9 from Vox, 25 fro AS 2024-08-20 08:50:35,681 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 30 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-20 08:50:37,008 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 14500, loss[loss=0.09983, beats_loss=0.01081, ecapa_loss=0.0001197, whisper_loss=0.08782, over 23908.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01037, ecapa_loss=0.0001393, whisper_loss=0.08998, over 3787406.32 frames. ], batch size: 94, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:50:41,036 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=15.0 2024-08-20 08:50:47,336 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 08:51:05,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4738900.0, ans=0.1 2024-08-20 08:51:08,507 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 25 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-20 08:51:35,683 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 29 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-20 08:51:36,943 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 23 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-20 08:51:53,687 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 24 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-20 08:52:00,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4739200.0, ans=0.125 2024-08-20 08:52:11,791 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 14550, loss[loss=0.09045, beats_loss=0.01013, ecapa_loss=0.0001405, whisper_loss=0.07891, over 21058.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01042, ecapa_loss=0.0001391, whisper_loss=0.08912, over 3800054.13 frames. ], batch size: 86, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:52:34,249 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.256e+01 2.477e+01 2.723e+01 4.705e+01, threshold=4.954e+01, percent-clipped=0.0 2024-08-20 08:52:38,623 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 32 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-20 08:53:05,144 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2024-08-20 08:53:19,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4739600.0, ans=0.125 2024-08-20 08:53:26,461 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=15.0 2024-08-20 08:53:29,808 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.64 vs. limit=15.0 2024-08-20 08:53:30,713 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 27 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-20 08:53:35,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4739700.0, ans=0.2 2024-08-20 08:53:38,142 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 32 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-20 08:53:41,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4739700.0, ans=0.1 2024-08-20 08:53:44,330 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 14600, loss[loss=0.09402, beats_loss=0.009732, ecapa_loss=0.0001297, whisper_loss=0.08299, over 16576.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01036, ecapa_loss=0.0001384, whisper_loss=0.08995, over 3790044.32 frames. ], batch size: 65, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:53:54,576 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-20 08:54:44,482 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 16 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-20 08:54:47,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4740100.0, ans=0.0 2024-08-20 08:54:53,964 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 28 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-20 08:55:10,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4740200.0, ans=0.125 2024-08-20 08:55:16,419 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 14650, loss[loss=0.1129, beats_loss=0.008101, ecapa_loss=0.000164, whisper_loss=0.1032, over 17507.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01034, ecapa_loss=0.0001387, whisper_loss=0.08976, over 3786902.67 frames. ], batch size: 70, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:55:19,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4740300.0, ans=0.1 2024-08-20 08:55:24,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.77 vs. limit=15.0 2024-08-20 08:55:37,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4740400.0, ans=0.125 2024-08-20 08:55:38,305 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.329e+01 2.529e+01 2.848e+01 4.887e+01, threshold=5.058e+01, percent-clipped=0.0 2024-08-20 08:56:09,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4740600.0, ans=0.0 2024-08-20 08:56:13,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=4740600.0, ans=10.0 2024-08-20 08:56:45,544 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 14700, loss[loss=0.083, beats_loss=0.01274, ecapa_loss=0.0001434, whisper_loss=0.06883, over 21924.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01038, ecapa_loss=0.0001398, whisper_loss=0.08992, over 3822881.14 frames. ], batch size: 90, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:56:56,687 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 24 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 08:57:10,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.36 vs. limit=10.0 2024-08-20 08:57:21,261 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 26 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-20 08:57:28,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4741000.0, ans=0.1 2024-08-20 08:57:40,167 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.80 vs. limit=15.0 2024-08-20 08:57:46,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4741100.0, ans=0.0 2024-08-20 08:57:59,897 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2024-08-20 08:58:02,438 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.23 vs. limit=15.0 2024-08-20 08:58:03,798 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.64 vs. limit=10.0 2024-08-20 08:58:05,994 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.78 vs. limit=15.0 2024-08-20 08:58:14,282 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 28 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-20 08:58:15,526 INFO [train_multi_KD3.py:1117] (3/4) Epoch 32, batch 14750, loss[loss=0.1037, beats_loss=0.01137, ecapa_loss=0.0001526, whisper_loss=0.09083, over 22969.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01044, ecapa_loss=0.0001393, whisper_loss=0.0894, over 3806980.37 frames. ], batch size: 95, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:58:36,595 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.385e+01 2.604e+01 3.059e+01 5.323e+01, threshold=5.208e+01, percent-clipped=1.0 2024-08-20 08:58:41,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4741400.0, ans=0.125 2024-08-20 08:58:42,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4741400.0, ans=0.125 2024-08-20 08:59:02,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4741500.0, ans=0.0 2024-08-20 08:59:32,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4741700.0, ans=0.0 2024-08-20 09:00:12,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4741780.0, ans=0.125 2024-08-20 09:00:13,081 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 0, loss[loss=0.07717, beats_loss=0.01124, ecapa_loss=0.0001232, whisper_loss=0.06469, over 20957.00 frames. ], tot_loss[loss=0.07717, beats_loss=0.01124, ecapa_loss=0.0001232, whisper_loss=0.06469, over 20957.00 frames. ], batch size: 85, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:00:13,082 INFO [train_multi_KD3.py:1140] (3/4) Computing validation loss 2024-08-20 09:00:48,215 INFO [train_multi_KD3.py:1150] (3/4) Epoch 33, validation on ASR_libri: loss=0.2542, beats_loss=0, ecapa_loss=0.0005003, whisper_loss=0.2492, over 931116.00 frames. 2024-08-20 09:01:09,256 INFO [train_multi_KD3.py:1150] (3/4) Epoch 33, validation on SV_voxceleb1: loss=0.003963, beats_loss=0, ecapa_loss=0.0003963, whisper_loss=0, over 944235.00 frames. 2024-08-20 09:02:51,327 INFO [train_multi_KD3.py:1150] (3/4) Epoch 33, validation on AT_audioset: loss=0.02307, beats_loss=0.02307, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 09:02:51,330 INFO [train_multi_KD3.py:1156] (3/4) Maximum memory allocated so far is 32606MB 2024-08-20 09:02:59,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4741780.0, ans=0.2 2024-08-20 09:03:07,434 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 29 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-20 09:03:24,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4741880.0, ans=0.0 2024-08-20 09:03:30,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4741880.0, ans=0.2 2024-08-20 09:03:55,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4741980.0, ans=0.125 2024-08-20 09:03:57,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4741980.0, ans=0.0 2024-08-20 09:03:59,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4741980.0, ans=0.09899494936611666 2024-08-20 09:04:00,258 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2024-08-20 09:04:16,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4742080.0, ans=0.2 2024-08-20 09:04:19,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4742080.0, ans=0.125 2024-08-20 09:04:21,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4742080.0, ans=0.125 2024-08-20 09:04:57,998 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 50, loss[loss=0.09733, beats_loss=0.00931, ecapa_loss=0.0001639, whisper_loss=0.08638, over 22272.00 frames. ], tot_loss[loss=0.09897, beats_loss=0.009395, ecapa_loss=0.0001435, whisper_loss=0.08814, over 882048.94 frames. ], batch size: 90, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:05:04,395 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2024-08-20 09:05:13,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4742280.0, ans=0.125 2024-08-20 09:05:23,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4742380.0, ans=0.125 2024-08-20 09:05:24,344 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 24 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-20 09:05:31,505 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.495e+01 2.772e+01 3.142e+01 4.372e+01, threshold=5.543e+01, percent-clipped=0.0 2024-08-20 09:06:04,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4742480.0, ans=0.125 2024-08-20 09:06:40,305 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 27 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 09:06:41,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4742680.0, ans=0.07 2024-08-20 09:06:46,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4742680.0, ans=0.0 2024-08-20 09:06:53,978 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 100, loss[loss=0.06975, beats_loss=0.01277, ecapa_loss=0.0001124, whisper_loss=0.05586, over 15953.00 frames. ], tot_loss[loss=0.09904, beats_loss=0.009186, ecapa_loss=0.0001427, whisper_loss=0.08843, over 1504153.41 frames. ], batch size: 64, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:07:10,741 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.68 vs. limit=15.0 2024-08-20 09:07:14,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4742780.0, ans=0.125 2024-08-20 09:07:21,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4742880.0, ans=0.125 2024-08-20 09:07:52,214 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2024-08-20 09:07:53,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4742980.0, ans=0.2 2024-08-20 09:08:24,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4743180.0, ans=0.125 2024-08-20 09:08:43,850 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 150, loss[loss=0.102, beats_loss=0.01045, ecapa_loss=0.0001403, whisper_loss=0.09011, over 16030.00 frames. ], tot_loss[loss=0.09926, beats_loss=0.009216, ecapa_loss=0.0001417, whisper_loss=0.08863, over 1997851.14 frames. ], batch size: 63, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:08:51,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4743280.0, ans=0.5 2024-08-20 09:08:55,911 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 09:09:01,113 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.797e+00 2024-08-20 09:09:04,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4743380.0, ans=0.1 2024-08-20 09:09:11,141 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.463e+01 2.692e+01 3.124e+01 4.669e+01, threshold=5.384e+01, percent-clipped=0.0 2024-08-20 09:09:17,324 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 09:09:22,796 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 23 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-20 09:09:31,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4743480.0, ans=0.125 2024-08-20 09:09:58,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4743680.0, ans=0.1 2024-08-20 09:10:17,690 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 200, loss[loss=0.1043, beats_loss=0.008131, ecapa_loss=0.0001589, whisper_loss=0.09458, over 14018.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.009455, ecapa_loss=0.0001423, whisper_loss=0.09003, over 2383581.11 frames. ], batch size: 55, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:10:30,976 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.655e-03 2024-08-20 09:10:46,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4743880.0, ans=0.125 2024-08-20 09:10:48,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4743880.0, ans=0.2 2024-08-20 09:10:50,789 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=15.0 2024-08-20 09:11:08,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4743980.0, ans=0.1 2024-08-20 09:11:10,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4744080.0, ans=0.2 2024-08-20 09:11:15,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4744080.0, ans=0.125 2024-08-20 09:11:41,919 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.56 vs. limit=15.0 2024-08-20 09:11:45,349 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 250, loss[loss=0.09738, beats_loss=0.01201, ecapa_loss=0.0001166, whisper_loss=0.0842, over 18448.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.009723, ecapa_loss=0.0001412, whisper_loss=0.09062, over 2677286.41 frames. ], batch size: 73, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:11:51,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4744280.0, ans=0.1 2024-08-20 09:11:55,228 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.79 vs. limit=10.0 2024-08-20 09:12:09,773 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.364e+01 2.600e+01 2.936e+01 1.943e+02, threshold=5.200e+01, percent-clipped=2.0 2024-08-20 09:12:12,864 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.76 vs. limit=6.0 2024-08-20 09:12:14,475 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.05 vs. limit=15.0 2024-08-20 09:12:15,995 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.55 vs. limit=15.0 2024-08-20 09:12:22,248 INFO [train_multi_KD3.py:845] (3/4) A total of 49 cuts. 16 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-20 09:12:26,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4744480.0, ans=0.125 2024-08-20 09:12:29,542 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 20 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 09:12:32,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4744480.0, ans=0.0 2024-08-20 09:12:40,088 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-20 09:12:51,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4744580.0, ans=0.09899494936611666 2024-08-20 09:13:00,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4744680.0, ans=0.125 2024-08-20 09:13:13,947 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 300, loss[loss=0.0917, beats_loss=0.01185, ecapa_loss=0.0001336, whisper_loss=0.07851, over 18931.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01001, ecapa_loss=0.0001397, whisper_loss=0.08948, over 2900307.14 frames. ], batch size: 73, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:13:30,071 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 10 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-20 09:13:58,911 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.36 vs. limit=15.0 2024-08-20 09:14:34,848 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 09:14:43,504 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 350, loss[loss=0.1067, beats_loss=0.009549, ecapa_loss=0.0001317, whisper_loss=0.09583, over 21235.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01024, ecapa_loss=0.0001368, whisper_loss=0.08874, over 3064373.39 frames. ], batch size: 81, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:14:54,282 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 19 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-20 09:15:08,050 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.237e+01 2.517e+01 2.824e+01 3.334e+02, threshold=5.035e+01, percent-clipped=1.0 2024-08-20 09:15:25,087 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 19 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-20 09:15:25,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4745480.0, ans=0.125 2024-08-20 09:15:31,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4745480.0, ans=0.1 2024-08-20 09:15:36,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4745580.0, ans=0.125 2024-08-20 09:15:49,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4745580.0, ans=0.0 2024-08-20 09:15:50,877 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 17 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-20 09:15:53,848 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.320e+00 2024-08-20 09:16:04,759 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 18 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-20 09:16:08,474 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 19 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 09:16:15,549 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 400, loss[loss=0.09014, beats_loss=0.01006, ecapa_loss=0.0001858, whisper_loss=0.07821, over 18408.00 frames. ], tot_loss[loss=0.09991, beats_loss=0.01036, ecapa_loss=0.0001357, whisper_loss=0.08819, over 3192965.89 frames. ], batch size: 78, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:16:15,776 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 31 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-20 09:16:51,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4745980.0, ans=0.1 2024-08-20 09:17:10,336 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 34 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-20 09:17:14,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4746080.0, ans=0.125 2024-08-20 09:17:16,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4746080.0, ans=0.125 2024-08-20 09:17:25,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4746080.0, ans=0.125 2024-08-20 09:17:27,136 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 29 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-20 09:17:37,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4746180.0, ans=0.2 2024-08-20 09:17:47,697 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 450, loss[loss=0.07959, beats_loss=0.01327, ecapa_loss=0.0001215, whisper_loss=0.0651, over 17113.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.0103, ecapa_loss=0.000137, whisper_loss=0.08839, over 3328666.36 frames. ], batch size: 69, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:17:58,701 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 16 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-20 09:18:12,017 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.619e+01 2.270e+01 2.468e+01 2.712e+01 4.275e+01, threshold=4.935e+01, percent-clipped=0.0 2024-08-20 09:18:16,419 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.60 vs. limit=12.0 2024-08-20 09:18:18,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=4746380.0, ans=15.0 2024-08-20 09:18:19,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4746380.0, ans=0.125 2024-08-20 09:18:23,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4746480.0, ans=0.125 2024-08-20 09:18:42,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4746580.0, ans=0.125 2024-08-20 09:18:46,184 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 16 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 09:18:46,870 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.86 vs. limit=12.0 2024-08-20 09:18:55,518 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.16 vs. limit=15.0 2024-08-20 09:18:56,009 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 22 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-20 09:19:01,766 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 22 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-20 09:19:18,910 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 500, loss[loss=0.1199, beats_loss=0.009876, ecapa_loss=0.0001514, whisper_loss=0.1085, over 23160.00 frames. ], tot_loss[loss=0.09987, beats_loss=0.01029, ecapa_loss=0.000137, whisper_loss=0.08821, over 3443162.52 frames. ], batch size: 90, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:19:32,614 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 24 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-20 09:20:03,416 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-20 09:20:45,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4747180.0, ans=0.09899494936611666 2024-08-20 09:20:50,233 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 550, loss[loss=0.0853, beats_loss=0.01131, ecapa_loss=0.0001118, whisper_loss=0.07287, over 22752.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01038, ecapa_loss=0.0001371, whisper_loss=0.08856, over 3543225.16 frames. ], batch size: 90, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:20:53,092 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 09:20:56,089 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 27 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-20 09:21:11,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4747380.0, ans=0.0 2024-08-20 09:21:12,326 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.24 vs. limit=15.0 2024-08-20 09:21:14,630 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.750e+01 2.259e+01 2.517e+01 2.843e+01 4.116e+01, threshold=5.034e+01, percent-clipped=0.0 2024-08-20 09:21:24,287 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 23 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 09:21:27,784 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 27 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-20 09:21:46,073 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 30 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-20 09:22:22,670 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 600, loss[loss=0.1068, beats_loss=0.009252, ecapa_loss=0.0001368, whisper_loss=0.09621, over 21315.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01026, ecapa_loss=0.0001374, whisper_loss=0.08917, over 3627830.76 frames. ], batch size: 82, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:22:27,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4747780.0, ans=0.125 2024-08-20 09:22:30,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4747780.0, ans=0.125 2024-08-20 09:22:35,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4747780.0, ans=0.125 2024-08-20 09:23:13,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4747980.0, ans=0.1 2024-08-20 09:23:15,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4747980.0, ans=0.2 2024-08-20 09:23:20,978 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.86 vs. limit=12.0 2024-08-20 09:23:28,331 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.25 vs. limit=22.5 2024-08-20 09:23:36,808 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.15 vs. limit=15.0 2024-08-20 09:23:43,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4748180.0, ans=0.0 2024-08-20 09:23:52,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4748280.0, ans=0.125 2024-08-20 09:23:52,935 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 650, loss[loss=0.09331, beats_loss=0.01132, ecapa_loss=0.0001337, whisper_loss=0.08065, over 17639.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01023, ecapa_loss=0.0001371, whisper_loss=0.08984, over 3681621.18 frames. ], batch size: 73, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:24:14,217 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 23 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-20 09:24:15,695 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 18 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-20 09:24:17,127 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.327e+01 2.614e+01 2.843e+01 3.937e+01, threshold=5.228e+01, percent-clipped=0.0 2024-08-20 09:24:18,615 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.80 vs. limit=22.5 2024-08-20 09:24:28,143 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 16 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-20 09:24:57,719 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 20 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-20 09:25:00,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4748580.0, ans=0.2 2024-08-20 09:25:18,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4748680.0, ans=0.125 2024-08-20 09:25:21,280 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 700, loss[loss=0.07866, beats_loss=0.01294, ecapa_loss=0.0001307, whisper_loss=0.06441, over 22659.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01023, ecapa_loss=0.0001382, whisper_loss=0.08917, over 3689120.37 frames. ], batch size: 94, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:25:48,265 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 24 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-20 09:25:57,613 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 23 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 09:26:04,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4748980.0, ans=0.025 2024-08-20 09:26:12,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4749080.0, ans=0.2 2024-08-20 09:26:25,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4749080.0, ans=0.2 2024-08-20 09:26:45,615 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 20 from LS+wenet, 10 from Vox, 21 fro AS 2024-08-20 09:26:48,806 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 750, loss[loss=0.09914, beats_loss=0.009379, ecapa_loss=0.0001398, whisper_loss=0.08836, over 21900.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01027, ecapa_loss=0.0001374, whisper_loss=0.08942, over 3685595.65 frames. ], batch size: 86, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:26:56,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4749280.0, ans=0.125 2024-08-20 09:27:13,065 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.850e+01 2.302e+01 2.530e+01 2.816e+01 3.828e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-20 09:27:20,745 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 09:27:40,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4749580.0, ans=0.125 2024-08-20 09:27:47,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4749580.0, ans=0.0 2024-08-20 09:27:51,165 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 09:28:18,123 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 800, loss[loss=0.1092, beats_loss=0.01097, ecapa_loss=0.0001156, whisper_loss=0.09708, over 17364.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01026, ecapa_loss=0.0001368, whisper_loss=0.08929, over 3711897.70 frames. ], batch size: 67, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:28:42,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=4749880.0, ans=15.0 2024-08-20 09:28:42,560 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.13 vs. limit=15.0 2024-08-20 09:28:53,570 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-08-20 09:28:54,073 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 30 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-20 09:28:56,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4749980.0, ans=0.125 2024-08-20 09:28:58,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4749980.0, ans=0.125 2024-08-20 09:29:03,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4749980.0, ans=0.125 2024-08-20 09:29:10,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4750080.0, ans=0.125 2024-08-20 09:29:18,460 INFO [train_multi_KD3.py:845] (3/4) A total of 49 cuts. 11 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 09:29:36,203 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 25 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-20 09:29:39,974 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 26 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 09:29:46,189 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 850, loss[loss=0.1399, beats_loss=0.007783, ecapa_loss=0.0001084, whisper_loss=0.131, over 20228.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01029, ecapa_loss=0.0001361, whisper_loss=0.08944, over 3731854.74 frames. ], batch size: 71, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:29:51,342 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 25 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-20 09:29:54,161 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.69 vs. limit=15.0 2024-08-20 09:29:56,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4750280.0, ans=0.125 2024-08-20 09:30:03,257 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 21 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-20 09:30:11,287 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.195e+01 2.440e+01 2.729e+01 3.750e+01, threshold=4.881e+01, percent-clipped=0.0 2024-08-20 09:30:15,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4750380.0, ans=0.0 2024-08-20 09:30:16,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4750380.0, ans=0.125 2024-08-20 09:30:47,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4750580.0, ans=0.1 2024-08-20 09:31:03,975 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.93 vs. limit=15.0 2024-08-20 09:31:09,798 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 20 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-20 09:31:11,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4750680.0, ans=0.2 2024-08-20 09:31:13,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4750680.0, ans=0.2 2024-08-20 09:31:15,785 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 900, loss[loss=0.08979, beats_loss=0.01079, ecapa_loss=9.861e-05, whisper_loss=0.07801, over 15611.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01031, ecapa_loss=0.000136, whisper_loss=0.08924, over 3763953.36 frames. ], batch size: 58, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:31:18,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4750780.0, ans=0.0 2024-08-20 09:31:22,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=4750780.0, ans=15.0 2024-08-20 09:31:39,428 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 30 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 09:31:39,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4750880.0, ans=0.0 2024-08-20 09:31:41,530 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 25 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-20 09:31:41,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4750880.0, ans=0.125 2024-08-20 09:31:52,522 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.96 vs. limit=15.0 2024-08-20 09:32:17,529 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-20 09:32:21,315 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.94 vs. limit=15.0 2024-08-20 09:32:43,817 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 950, loss[loss=0.1016, beats_loss=0.008046, ecapa_loss=0.0001263, whisper_loss=0.09233, over 13676.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01027, ecapa_loss=0.0001369, whisper_loss=0.08893, over 3736717.43 frames. ], batch size: 50, lr: 1.86e-03, grad_scale: 1.152921504606847e+18 2024-08-20 09:32:54,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4751280.0, ans=0.0 2024-08-20 09:33:08,812 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.583e+01 2.210e+01 2.427e+01 2.730e+01 1.118e+02, threshold=4.854e+01, percent-clipped=2.0 2024-08-20 09:33:16,556 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-20 09:33:38,862 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.69 vs. limit=12.0 2024-08-20 09:33:39,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4751580.0, ans=0.125 2024-08-20 09:33:45,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4751580.0, ans=0.0 2024-08-20 09:34:01,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4751680.0, ans=0.2 2024-08-20 09:34:12,751 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 1000, loss[loss=0.09351, beats_loss=0.01113, ecapa_loss=0.0001436, whisper_loss=0.08094, over 22518.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01027, ecapa_loss=0.0001378, whisper_loss=0.08842, over 3730030.11 frames. ], batch size: 92, lr: 1.86e-03, grad_scale: 1.152921504606847e+18 2024-08-20 09:34:15,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4751780.0, ans=0.125 2024-08-20 09:34:22,536 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-20 09:34:31,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4751880.0, ans=0.2 2024-08-20 09:34:50,993 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.49 vs. limit=15.0 2024-08-20 09:34:51,699 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 21 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-20 09:34:53,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4751980.0, ans=0.07 2024-08-20 09:34:59,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4751980.0, ans=0.0 2024-08-20 09:35:03,437 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 34 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 09:35:14,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4752080.0, ans=0.125 2024-08-20 09:35:23,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4752180.0, ans=0.125 2024-08-20 09:35:42,300 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 1050, loss[loss=0.1045, beats_loss=0.01163, ecapa_loss=0.0001344, whisper_loss=0.09156, over 13173.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01025, ecapa_loss=0.0001381, whisper_loss=0.08837, over 3751338.77 frames. ], batch size: 52, lr: 1.86e-03, grad_scale: 1.152921504606847e+18 2024-08-20 09:35:52,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=15.0 2024-08-20 09:36:04,754 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.92 vs. limit=22.5 2024-08-20 09:36:05,818 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 25 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-20 09:36:08,979 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.242e+01 2.597e+01 2.833e+01 4.409e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-20 09:36:16,152 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 12 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-20 09:36:19,936 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 19 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-20 09:36:20,466 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.96 vs. limit=22.5 2024-08-20 09:36:49,021 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 33 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 09:36:51,084 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.95 vs. limit=15.0 2024-08-20 09:36:55,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4752680.0, ans=0.125 2024-08-20 09:37:09,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4752680.0, ans=0.125 2024-08-20 09:37:12,232 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 1100, loss[loss=0.09737, beats_loss=0.009553, ecapa_loss=0.0001658, whisper_loss=0.08616, over 12529.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01023, ecapa_loss=0.0001386, whisper_loss=0.08853, over 3763716.13 frames. ], batch size: 53, lr: 1.86e-03, grad_scale: 1.152921504606847e+18 2024-08-20 09:37:13,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.83 vs. limit=22.5 2024-08-20 09:37:14,581 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 32 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-20 09:37:16,388 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 20 from LS+wenet, 15 from Vox, 16 fro AS 2024-08-20 09:37:37,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4752880.0, ans=0.125 2024-08-20 09:37:47,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4752880.0, ans=0.125 2024-08-20 09:37:47,960 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.93 vs. limit=22.5 2024-08-20 09:37:49,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4752980.0, ans=0.1 2024-08-20 09:37:53,249 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.96 vs. limit=15.0 2024-08-20 09:38:16,027 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.90 vs. limit=15.0 2024-08-20 09:38:18,243 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 23 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-20 09:38:28,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4753180.0, ans=0.125 2024-08-20 09:38:38,011 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 16 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 09:38:41,125 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 13 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-20 09:38:42,099 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 1150, loss[loss=0.07501, beats_loss=0.01222, ecapa_loss=0.0001696, whisper_loss=0.0611, over 12930.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01026, ecapa_loss=0.000138, whisper_loss=0.08874, over 3763771.19 frames. ], batch size: 56, lr: 1.86e-03, grad_scale: 1.152921504606847e+18 2024-08-20 09:38:46,082 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 21 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-20 09:38:50,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4753280.0, ans=0.1 2024-08-20 09:38:56,005 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.97 vs. limit=15.0 2024-08-20 09:39:06,833 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.346e+01 2.635e+01 2.990e+01 2.498e+02, threshold=5.271e+01, percent-clipped=4.0 2024-08-20 09:39:31,229 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.73 vs. limit=10.0 2024-08-20 09:39:32,907 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 09:39:34,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4753580.0, ans=0.0 2024-08-20 09:39:59,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4753680.0, ans=0.125 2024-08-20 09:40:10,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4753780.0, ans=0.125 2024-08-20 09:40:11,337 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 1200, loss[loss=0.09384, beats_loss=0.01337, ecapa_loss=0.0001626, whisper_loss=0.07884, over 16126.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01034, ecapa_loss=0.0001382, whisper_loss=0.08847, over 3781299.06 frames. ], batch size: 66, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:40:22,453 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 20 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-20 09:40:37,004 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 28 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-20 09:40:47,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4753980.0, ans=0.0 2024-08-20 09:41:04,388 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 23 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-20 09:41:39,240 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 1250, loss[loss=0.09128, beats_loss=0.009354, ecapa_loss=0.0001249, whisper_loss=0.08068, over 15141.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01034, ecapa_loss=0.0001377, whisper_loss=0.08852, over 3761429.19 frames. ], batch size: 56, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:41:41,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4754280.0, ans=0.0 2024-08-20 09:41:47,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4754280.0, ans=0.0 2024-08-20 09:42:02,724 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 18 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-20 09:42:05,567 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.283e+01 2.750e+01 2.979e+01 6.876e+01, threshold=5.500e+01, percent-clipped=2.0 2024-08-20 09:42:06,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4754380.0, ans=0.125 2024-08-20 09:42:09,708 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 15 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-20 09:42:10,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4754380.0, ans=0.1 2024-08-20 09:42:25,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4754480.0, ans=0.125 2024-08-20 09:42:39,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4754580.0, ans=0.1 2024-08-20 09:42:48,455 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 14 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-20 09:42:54,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4754680.0, ans=0.125 2024-08-20 09:43:07,368 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 1300, loss[loss=0.07621, beats_loss=0.01145, ecapa_loss=0.000147, whisper_loss=0.06329, over 18174.00 frames. ], tot_loss[loss=0.09965, beats_loss=0.01023, ecapa_loss=0.0001375, whisper_loss=0.08804, over 3721590.04 frames. ], batch size: 74, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:43:35,620 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 09:43:37,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4754880.0, ans=0.125 2024-08-20 09:43:39,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4754880.0, ans=0.125 2024-08-20 09:43:41,116 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 23 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-20 09:43:46,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4754980.0, ans=0.04949747468305833 2024-08-20 09:43:50,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4754980.0, ans=0.125 2024-08-20 09:44:21,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4755180.0, ans=0.5 2024-08-20 09:44:23,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4755180.0, ans=0.125 2024-08-20 09:44:25,866 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 17 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-20 09:44:37,944 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 1350, loss[loss=0.08873, beats_loss=0.01062, ecapa_loss=0.000118, whisper_loss=0.07692, over 13995.00 frames. ], tot_loss[loss=0.09976, beats_loss=0.01033, ecapa_loss=0.0001371, whisper_loss=0.08807, over 3734912.03 frames. ], batch size: 52, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:44:51,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4755280.0, ans=0.0 2024-08-20 09:45:04,644 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.108e+01 2.418e+01 2.623e+01 3.290e+01, threshold=4.836e+01, percent-clipped=0.0 2024-08-20 09:45:06,332 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.32 vs. limit=15.0 2024-08-20 09:45:22,838 WARNING [optim.py:496] (3/4) Scaling gradients by 0.032859351485967636, model_norm_threshold=48.36314392089844 2024-08-20 09:45:23,008 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.250e+05, grad_sumsq=3.697e+04, orig_rms_sq=8.792e+00 2024-08-20 09:45:37,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4755580.0, ans=0.125 2024-08-20 09:45:38,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4755580.0, ans=0.1 2024-08-20 09:45:41,850 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 27 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 09:45:43,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4755580.0, ans=0.0 2024-08-20 09:46:08,155 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 1400, loss[loss=0.1009, beats_loss=0.01049, ecapa_loss=0.0001229, whisper_loss=0.08915, over 18352.00 frames. ], tot_loss[loss=0.09993, beats_loss=0.01032, ecapa_loss=0.0001384, whisper_loss=0.08823, over 3738563.61 frames. ], batch size: 69, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:46:12,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4755780.0, ans=0.125 2024-08-20 09:46:14,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4755780.0, ans=0.125 2024-08-20 09:46:21,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4755780.0, ans=0.125 2024-08-20 09:46:27,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4755880.0, ans=0.5 2024-08-20 09:46:33,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4755880.0, ans=0.95 2024-08-20 09:46:37,136 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 16 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-20 09:46:40,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4755880.0, ans=0.1 2024-08-20 09:46:44,165 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.02 vs. limit=10.0 2024-08-20 09:46:54,254 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 23 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-20 09:47:02,659 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 22 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-20 09:47:06,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4756080.0, ans=0.0 2024-08-20 09:47:17,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4756180.0, ans=0.0 2024-08-20 09:47:29,421 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.22 vs. limit=22.5 2024-08-20 09:47:34,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=4756280.0, ans=15.0 2024-08-20 09:47:35,276 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 1450, loss[loss=0.06758, beats_loss=0.01177, ecapa_loss=0.0001731, whisper_loss=0.05408, over 13097.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01032, ecapa_loss=0.0001375, whisper_loss=0.08844, over 3734265.60 frames. ], batch size: 54, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:47:40,637 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 17 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 09:47:45,901 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 21 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-20 09:47:57,924 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 17 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 09:48:01,104 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.208e+01 2.529e+01 2.783e+01 1.472e+03, threshold=5.058e+01, percent-clipped=1.0 2024-08-20 09:48:03,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4756380.0, ans=0.125 2024-08-20 09:49:08,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4756580.0, ans=0.0 2024-08-20 09:49:30,537 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 1500, loss[loss=0.1124, beats_loss=0.009335, ecapa_loss=0.0001324, whisper_loss=0.1018, over 17841.00 frames. ], tot_loss[loss=0.09961, beats_loss=0.01025, ecapa_loss=0.000137, whisper_loss=0.08799, over 3724259.20 frames. ], batch size: 67, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:49:44,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4756780.0, ans=0.0 2024-08-20 09:49:46,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4756780.0, ans=0.125 2024-08-20 09:50:15,815 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 21 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-20 09:50:19,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4756980.0, ans=0.0 2024-08-20 09:50:20,815 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 23 from LS+wenet, 30 from Vox, 15 fro AS 2024-08-20 09:50:26,964 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=15.0 2024-08-20 09:50:30,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4757080.0, ans=0.125 2024-08-20 09:50:35,305 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.77 vs. limit=22.5 2024-08-20 09:50:51,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4757180.0, ans=0.0 2024-08-20 09:50:59,841 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 31 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 09:51:02,537 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.00 vs. limit=22.5 2024-08-20 09:51:03,017 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 1550, loss[loss=0.1085, beats_loss=0.008276, ecapa_loss=0.0001439, whisper_loss=0.09874, over 19168.00 frames. ], tot_loss[loss=0.09926, beats_loss=0.01025, ecapa_loss=0.0001373, whisper_loss=0.08763, over 3705363.17 frames. ], batch size: 73, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:51:18,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4757280.0, ans=0.125 2024-08-20 09:51:22,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4757380.0, ans=0.0 2024-08-20 09:51:30,427 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.660e+01 2.231e+01 2.477e+01 2.793e+01 4.044e+01, threshold=4.954e+01, percent-clipped=0.0 2024-08-20 09:51:40,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4757480.0, ans=0.125 2024-08-20 09:51:44,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4757480.0, ans=0.1 2024-08-20 09:52:06,224 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.84 vs. limit=22.5 2024-08-20 09:52:07,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4757580.0, ans=0.0 2024-08-20 09:52:08,359 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.79 vs. limit=8.0 2024-08-20 09:52:23,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4757680.0, ans=0.2 2024-08-20 09:52:35,569 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 1600, loss[loss=0.1102, beats_loss=0.008333, ecapa_loss=0.0001253, whisper_loss=0.1006, over 18220.00 frames. ], tot_loss[loss=0.09995, beats_loss=0.01029, ecapa_loss=0.0001355, whisper_loss=0.08831, over 3758481.99 frames. ], batch size: 69, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:52:38,741 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2024-08-20 09:53:04,650 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.29 vs. limit=12.0 2024-08-20 09:53:19,934 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.04 vs. limit=15.0 2024-08-20 09:53:23,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4757980.0, ans=0.125 2024-08-20 09:53:38,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4758080.0, ans=0.0 2024-08-20 09:53:49,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4758180.0, ans=0.09899494936611666 2024-08-20 09:53:59,833 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 26 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-20 09:54:06,367 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 1650, loss[loss=0.09886, beats_loss=0.01067, ecapa_loss=0.0001535, whisper_loss=0.08665, over 21835.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01021, ecapa_loss=0.0001363, whisper_loss=0.08886, over 3767590.86 frames. ], batch size: 91, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:54:10,601 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 28 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-20 09:54:13,718 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 19 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-20 09:54:32,271 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.203e+01 2.449e+01 2.785e+01 3.857e+01, threshold=4.898e+01, percent-clipped=0.0 2024-08-20 09:54:47,360 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 23 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-20 09:54:47,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4758480.0, ans=0.125 2024-08-20 09:55:04,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4758580.0, ans=0.125 2024-08-20 09:55:11,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4758580.0, ans=0.125 2024-08-20 09:55:32,230 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 18 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 09:55:32,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4758680.0, ans=0.0 2024-08-20 09:55:34,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4758780.0, ans=0.125 2024-08-20 09:55:35,280 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 1700, loss[loss=0.1053, beats_loss=0.008785, ecapa_loss=0.0001256, whisper_loss=0.09529, over 17492.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01015, ecapa_loss=0.0001366, whisper_loss=0.0894, over 3707684.72 frames. ], batch size: 64, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:56:14,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4758980.0, ans=0.1 2024-08-20 09:56:14,766 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.58 vs. limit=15.0 2024-08-20 09:56:47,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4759180.0, ans=0.0 2024-08-20 09:57:06,568 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 1750, loss[loss=0.1103, beats_loss=0.009967, ecapa_loss=0.0001147, whisper_loss=0.09922, over 15950.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01019, ecapa_loss=0.000136, whisper_loss=0.0891, over 3714928.03 frames. ], batch size: 56, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:57:12,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2024-08-20 09:57:12,950 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 19 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 09:57:24,362 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 34 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 09:57:33,288 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.295e+01 2.510e+01 2.716e+01 9.441e+01, threshold=5.020e+01, percent-clipped=1.0 2024-08-20 09:57:34,239 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.29 vs. limit=22.5 2024-08-20 09:57:41,992 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 31 from LS+wenet, 12 from Vox, 43 fro AS 2024-08-20 09:57:46,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4759480.0, ans=0.1 2024-08-20 09:57:46,636 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.68 vs. limit=12.0 2024-08-20 09:57:46,676 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.68 vs. limit=15.0 2024-08-20 09:58:01,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4759580.0, ans=0.0 2024-08-20 09:58:17,526 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.68 vs. limit=15.0 2024-08-20 09:58:25,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4759680.0, ans=0.125 2024-08-20 09:58:34,337 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 1800, loss[loss=0.07853, beats_loss=0.008398, ecapa_loss=0.0001215, whisper_loss=0.06892, over 14394.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01025, ecapa_loss=0.000135, whisper_loss=0.08865, over 3732456.90 frames. ], batch size: 52, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:58:55,119 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 16 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-20 09:59:02,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4759880.0, ans=0.0 2024-08-20 09:59:16,446 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 17 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-20 09:59:48,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4760180.0, ans=0.125 2024-08-20 09:59:48,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4760180.0, ans=0.125 2024-08-20 09:59:51,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4760180.0, ans=0.2 2024-08-20 10:00:01,337 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 1850, loss[loss=0.0664, beats_loss=0.01264, ecapa_loss=0.0001257, whisper_loss=0.0525, over 14999.00 frames. ], tot_loss[loss=0.09958, beats_loss=0.01031, ecapa_loss=0.0001347, whisper_loss=0.08792, over 3700001.62 frames. ], batch size: 59, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:00:11,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4760280.0, ans=0.95 2024-08-20 10:00:12,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4760280.0, ans=0.0 2024-08-20 10:00:27,074 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.285e+01 2.493e+01 2.881e+01 4.103e+01, threshold=4.985e+01, percent-clipped=0.0 2024-08-20 10:01:04,626 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 27 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 10:01:27,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4760780.0, ans=0.125 2024-08-20 10:01:28,641 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 1900, loss[loss=0.08467, beats_loss=0.01021, ecapa_loss=0.0001195, whisper_loss=0.07326, over 15475.00 frames. ], tot_loss[loss=0.09996, beats_loss=0.01027, ecapa_loss=0.0001355, whisper_loss=0.08833, over 3715414.94 frames. ], batch size: 57, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:01:46,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4760880.0, ans=0.0 2024-08-20 10:01:53,994 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.72 vs. limit=15.0 2024-08-20 10:02:15,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4760980.0, ans=0.125 2024-08-20 10:02:31,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4761080.0, ans=0.125 2024-08-20 10:02:44,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4761180.0, ans=0.125 2024-08-20 10:02:49,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4761180.0, ans=0.2 2024-08-20 10:02:49,648 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.64 vs. limit=15.0 2024-08-20 10:02:54,997 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 1950, loss[loss=0.07846, beats_loss=0.01036, ecapa_loss=0.000122, whisper_loss=0.06688, over 14890.00 frames. ], tot_loss[loss=0.09987, beats_loss=0.01031, ecapa_loss=0.0001343, whisper_loss=0.08821, over 3734502.92 frames. ], batch size: 59, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:03:02,757 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.68 vs. limit=15.0 2024-08-20 10:03:03,562 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 23 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 10:03:19,912 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.238e+01 2.475e+01 2.855e+01 5.978e+01, threshold=4.950e+01, percent-clipped=1.0 2024-08-20 10:03:46,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4761580.0, ans=0.125 2024-08-20 10:03:51,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4761580.0, ans=0.0 2024-08-20 10:04:00,778 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 20 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-20 10:04:07,488 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-20 10:04:20,512 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.43 vs. limit=22.5 2024-08-20 10:04:20,888 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 2000, loss[loss=0.1184, beats_loss=0.01049, ecapa_loss=0.0001401, whisper_loss=0.1065, over 22724.00 frames. ], tot_loss[loss=0.09991, beats_loss=0.0103, ecapa_loss=0.0001342, whisper_loss=0.08827, over 3725666.41 frames. ], batch size: 90, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:04:23,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4761780.0, ans=0.0 2024-08-20 10:04:50,211 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 16 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-20 10:05:25,825 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2024-08-20 10:05:40,554 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.14 vs. limit=15.0 2024-08-20 10:05:46,737 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 2050, loss[loss=0.08714, beats_loss=0.01169, ecapa_loss=0.0001508, whisper_loss=0.07394, over 22014.00 frames. ], tot_loss[loss=0.09996, beats_loss=0.01024, ecapa_loss=0.0001336, whisper_loss=0.08839, over 3721930.45 frames. ], batch size: 93, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:05:49,248 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 35 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-20 10:05:56,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4762280.0, ans=0.125 2024-08-20 10:06:13,898 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.228e+01 2.469e+01 2.687e+01 4.353e+01, threshold=4.939e+01, percent-clipped=0.0 2024-08-20 10:06:28,040 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 18 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-20 10:07:04,008 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 25 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 10:07:12,942 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 2100, loss[loss=0.08732, beats_loss=0.01227, ecapa_loss=0.0001127, whisper_loss=0.07392, over 13587.00 frames. ], tot_loss[loss=0.0998, beats_loss=0.01032, ecapa_loss=0.0001336, whisper_loss=0.08815, over 3750806.47 frames. ], batch size: 55, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:07:33,290 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 27 from LS+wenet, 16 from Vox, 50 fro AS 2024-08-20 10:07:50,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4762980.0, ans=0.2 2024-08-20 10:08:02,026 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 14 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-20 10:08:05,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4763080.0, ans=0.125 2024-08-20 10:08:26,657 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-20 10:08:29,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4763180.0, ans=0.125 2024-08-20 10:08:38,810 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 2150, loss[loss=0.1148, beats_loss=0.00974, ecapa_loss=0.0001301, whisper_loss=0.1038, over 17878.00 frames. ], tot_loss[loss=0.09912, beats_loss=0.01038, ecapa_loss=0.0001331, whisper_loss=0.08741, over 3748235.69 frames. ], batch size: 67, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:08:48,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4763280.0, ans=0.0 2024-08-20 10:08:51,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4763280.0, ans=0.2 2024-08-20 10:08:59,505 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.75 vs. limit=6.0 2024-08-20 10:09:05,281 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.276e+01 2.500e+01 2.856e+01 5.859e+01, threshold=5.000e+01, percent-clipped=1.0 2024-08-20 10:09:31,096 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 17 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-20 10:09:31,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4763580.0, ans=0.125 2024-08-20 10:09:34,568 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 13 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 10:09:51,505 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 19 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 10:09:53,126 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 17 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 10:10:05,305 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 2200, loss[loss=0.1094, beats_loss=0.0108, ecapa_loss=0.0001392, whisper_loss=0.09717, over 24078.00 frames. ], tot_loss[loss=0.09909, beats_loss=0.01048, ecapa_loss=0.0001334, whisper_loss=0.08728, over 3758383.99 frames. ], batch size: 94, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:10:06,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4763780.0, ans=0.125 2024-08-20 10:10:06,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4763780.0, ans=0.04949747468305833 2024-08-20 10:10:28,698 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.75 vs. limit=22.5 2024-08-20 10:10:35,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4763880.0, ans=0.1 2024-08-20 10:10:40,392 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 29 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-20 10:11:02,362 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 29 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 10:11:02,879 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=15.0 2024-08-20 10:11:12,044 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 23 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-20 10:11:13,488 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-20 10:11:22,077 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 33 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 10:11:30,010 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 2250, loss[loss=0.1103, beats_loss=0.006728, ecapa_loss=0.0001597, whisper_loss=0.1019, over 18867.00 frames. ], tot_loss[loss=0.09959, beats_loss=0.01046, ecapa_loss=0.0001343, whisper_loss=0.08779, over 3737732.11 frames. ], batch size: 70, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:11:34,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4764280.0, ans=0.0 2024-08-20 10:11:41,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4764280.0, ans=0.125 2024-08-20 10:11:55,374 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.206e+01 2.415e+01 2.754e+01 4.736e+01, threshold=4.831e+01, percent-clipped=0.0 2024-08-20 10:11:56,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4764380.0, ans=0.1 2024-08-20 10:12:05,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4764480.0, ans=0.1 2024-08-20 10:12:08,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4764480.0, ans=0.125 2024-08-20 10:12:09,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4764480.0, ans=0.125 2024-08-20 10:12:12,030 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.42 vs. limit=15.0 2024-08-20 10:12:24,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4764580.0, ans=0.125 2024-08-20 10:12:32,766 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 10:12:46,605 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.05 vs. limit=15.0 2024-08-20 10:12:55,605 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 2300, loss[loss=0.1321, beats_loss=0.005735, ecapa_loss=0.000169, whisper_loss=0.1247, over 18958.00 frames. ], tot_loss[loss=0.09997, beats_loss=0.0104, ecapa_loss=0.0001358, whisper_loss=0.08821, over 3732369.60 frames. ], batch size: 71, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:13:13,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4764880.0, ans=0.125 2024-08-20 10:13:23,930 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.96 vs. limit=6.0 2024-08-20 10:13:44,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4764980.0, ans=0.0 2024-08-20 10:13:44,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4764980.0, ans=0.125 2024-08-20 10:14:11,124 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-20 10:14:15,849 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-20 10:14:16,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4765180.0, ans=0.1 2024-08-20 10:14:19,117 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 16 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-20 10:14:21,865 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 2350, loss[loss=0.07615, beats_loss=0.01296, ecapa_loss=0.0001073, whisper_loss=0.06212, over 16429.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01039, ecapa_loss=0.0001369, whisper_loss=0.08938, over 3776339.93 frames. ], batch size: 65, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:14:26,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4765280.0, ans=0.0 2024-08-20 10:14:44,146 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 33 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-20 10:14:44,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4765380.0, ans=0.0 2024-08-20 10:14:47,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4765380.0, ans=0.0 2024-08-20 10:14:48,457 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.264e+01 2.559e+01 2.893e+01 5.027e+01, threshold=5.117e+01, percent-clipped=1.0 2024-08-20 10:14:53,825 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 20 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-20 10:15:03,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4765480.0, ans=0.0 2024-08-20 10:15:05,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4765480.0, ans=0.015 2024-08-20 10:15:13,942 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-20 10:15:20,098 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.27 vs. limit=10.0 2024-08-20 10:15:24,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4765580.0, ans=0.2 2024-08-20 10:15:26,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4765580.0, ans=0.125 2024-08-20 10:15:46,596 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 2400, loss[loss=0.09937, beats_loss=0.01044, ecapa_loss=0.0001651, whisper_loss=0.08727, over 15318.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01041, ecapa_loss=0.0001371, whisper_loss=0.08918, over 3761388.47 frames. ], batch size: 63, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:15:47,309 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 36 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-20 10:15:52,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4765780.0, ans=0.0 2024-08-20 10:15:59,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4765780.0, ans=0.1 2024-08-20 10:16:02,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4765880.0, ans=0.2 2024-08-20 10:16:02,901 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.822e+00 2024-08-20 10:16:50,977 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.95 vs. limit=15.0 2024-08-20 10:17:03,151 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.91 vs. limit=15.0 2024-08-20 10:17:11,568 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 2450, loss[loss=0.1166, beats_loss=0.00865, ecapa_loss=0.0001163, whisper_loss=0.1068, over 22969.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01039, ecapa_loss=0.0001363, whisper_loss=0.08939, over 3767141.08 frames. ], batch size: 87, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:17:24,884 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 19 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-20 10:17:25,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4766280.0, ans=0.0 2024-08-20 10:17:28,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4766380.0, ans=0.2 2024-08-20 10:17:30,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4766380.0, ans=0.125 2024-08-20 10:17:38,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4766380.0, ans=0.125 2024-08-20 10:17:38,561 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.311e+01 2.583e+01 2.758e+01 5.133e+01, threshold=5.165e+01, percent-clipped=1.0 2024-08-20 10:17:44,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4766380.0, ans=0.125 2024-08-20 10:17:46,002 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 20 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-20 10:17:53,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4766480.0, ans=0.0 2024-08-20 10:18:00,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4766480.0, ans=0.125 2024-08-20 10:18:20,431 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 26 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 10:18:32,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4766680.0, ans=0.0 2024-08-20 10:18:34,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4766680.0, ans=0.1 2024-08-20 10:18:40,987 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 2500, loss[loss=0.1029, beats_loss=0.00834, ecapa_loss=0.0001523, whisper_loss=0.09303, over 20345.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01051, ecapa_loss=0.0001365, whisper_loss=0.08898, over 3802082.01 frames. ], batch size: 79, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:18:47,027 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 21 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-20 10:19:06,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4766880.0, ans=10.0 2024-08-20 10:19:08,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4766880.0, ans=0.0 2024-08-20 10:19:20,192 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 31 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-20 10:19:20,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4766980.0, ans=0.0 2024-08-20 10:19:27,504 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 16 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-20 10:19:38,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4767080.0, ans=0.125 2024-08-20 10:19:43,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4767080.0, ans=0.125 2024-08-20 10:19:45,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4767080.0, ans=0.125 2024-08-20 10:19:50,791 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.01 vs. limit=15.0 2024-08-20 10:19:59,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=4767180.0, ans=0.1 2024-08-20 10:20:12,093 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 2550, loss[loss=0.08963, beats_loss=0.008862, ecapa_loss=0.0001272, whisper_loss=0.0795, over 13718.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01043, ecapa_loss=0.0001361, whisper_loss=0.08942, over 3806198.75 frames. ], batch size: 53, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:20:27,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4767280.0, ans=0.125 2024-08-20 10:20:39,333 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.312e+01 2.481e+01 2.687e+01 3.912e+01, threshold=4.961e+01, percent-clipped=0.0 2024-08-20 10:20:43,727 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 10:20:48,691 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 15 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-20 10:20:48,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4767480.0, ans=0.125 2024-08-20 10:21:10,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=4767580.0, ans=0.2 2024-08-20 10:21:42,281 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 2600, loss[loss=0.07625, beats_loss=0.009743, ecapa_loss=0.0001624, whisper_loss=0.06488, over 18096.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0104, ecapa_loss=0.0001366, whisper_loss=0.08955, over 3843624.15 frames. ], batch size: 75, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:21:57,117 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 21 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-20 10:21:58,732 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 20 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 10:22:00,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4767880.0, ans=0.0 2024-08-20 10:22:14,267 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 16 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-20 10:22:16,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=4767980.0, ans=0.1 2024-08-20 10:22:16,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.81 vs. limit=15.0 2024-08-20 10:22:23,263 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-20 10:22:26,716 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 21 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-20 10:22:26,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4767980.0, ans=0.125 2024-08-20 10:22:33,956 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-20 10:22:55,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4768180.0, ans=0.125 2024-08-20 10:23:10,967 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 2650, loss[loss=0.09525, beats_loss=0.01064, ecapa_loss=0.0001177, whisper_loss=0.08344, over 20105.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01038, ecapa_loss=0.0001374, whisper_loss=0.08944, over 3824016.24 frames. ], batch size: 78, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:23:15,634 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.837e+05 2024-08-20 10:23:15,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4768280.0, ans=0.1 2024-08-20 10:23:28,693 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.40 vs. limit=22.5 2024-08-20 10:23:38,222 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.716e+01 2.210e+01 2.428e+01 2.721e+01 4.084e+01, threshold=4.855e+01, percent-clipped=0.0 2024-08-20 10:24:06,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4768580.0, ans=0.0 2024-08-20 10:24:14,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4768580.0, ans=0.0 2024-08-20 10:24:24,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4768680.0, ans=0.2 2024-08-20 10:24:41,446 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 2700, loss[loss=0.1067, beats_loss=0.009755, ecapa_loss=0.0001505, whisper_loss=0.09543, over 18646.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01035, ecapa_loss=0.0001387, whisper_loss=0.08924, over 3814551.06 frames. ], batch size: 73, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:24:44,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4768780.0, ans=0.0 2024-08-20 10:24:58,339 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.24 vs. limit=22.5 2024-08-20 10:25:16,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=12.0 2024-08-20 10:25:19,466 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 10:25:41,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4769080.0, ans=0.125 2024-08-20 10:26:05,687 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 15 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 10:26:09,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4769180.0, ans=0.0 2024-08-20 10:26:12,621 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 2750, loss[loss=0.09089, beats_loss=0.009868, ecapa_loss=0.0001596, whisper_loss=0.07943, over 20528.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01043, ecapa_loss=0.0001389, whisper_loss=0.08885, over 3834919.32 frames. ], batch size: 86, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:26:19,786 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 20 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-20 10:26:24,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4769280.0, ans=0.125 2024-08-20 10:26:38,584 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.290e+01 2.547e+01 2.861e+01 3.965e+01, threshold=5.094e+01, percent-clipped=0.0 2024-08-20 10:26:51,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4769480.0, ans=0.125 2024-08-20 10:27:03,524 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.06 vs. limit=15.0 2024-08-20 10:27:31,893 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 24 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-20 10:27:41,798 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 2800, loss[loss=0.08773, beats_loss=0.00942, ecapa_loss=0.0001373, whisper_loss=0.07693, over 16270.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.0104, ecapa_loss=0.0001394, whisper_loss=0.08902, over 3823597.36 frames. ], batch size: 61, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:27:42,037 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 18 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 10:28:51,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4770180.0, ans=0.09899494936611666 2024-08-20 10:29:02,788 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 10:29:04,328 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 21 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-20 10:29:10,471 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 2850, loss[loss=0.07692, beats_loss=0.01316, ecapa_loss=8.227e-05, whisper_loss=0.06293, over 18048.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01042, ecapa_loss=0.0001379, whisper_loss=0.08901, over 3793568.62 frames. ], batch size: 69, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:29:16,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4770280.0, ans=0.0 2024-08-20 10:29:37,338 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.626e+01 2.251e+01 2.450e+01 2.765e+01 5.044e+01, threshold=4.900e+01, percent-clipped=0.0 2024-08-20 10:29:46,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4770480.0, ans=0.125 2024-08-20 10:29:51,988 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 15 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-20 10:30:17,640 INFO [train_multi_KD3.py:845] (3/4) A total of 96 cuts. 27 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-20 10:30:19,017 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.01 vs. limit=22.5 2024-08-20 10:30:25,683 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2024-08-20 10:30:38,583 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 2900, loss[loss=0.08744, beats_loss=0.01238, ecapa_loss=0.0001096, whisper_loss=0.07396, over 22204.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01036, ecapa_loss=0.0001382, whisper_loss=0.08957, over 3809978.34 frames. ], batch size: 89, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:31:05,281 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.43 vs. limit=12.0 2024-08-20 10:31:12,709 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 32 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 10:31:14,744 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 23 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 10:31:16,523 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 23 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-20 10:31:49,280 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.38 vs. limit=15.0 2024-08-20 10:32:08,265 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 2950, loss[loss=0.1085, beats_loss=0.01067, ecapa_loss=0.0001454, whisper_loss=0.09634, over 21855.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01038, ecapa_loss=0.0001387, whisper_loss=0.08954, over 3786803.59 frames. ], batch size: 93, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:32:08,448 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 38 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 10:32:13,711 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 25 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-20 10:32:24,359 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.36 vs. limit=10.0 2024-08-20 10:32:25,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4771380.0, ans=0.0 2024-08-20 10:32:35,047 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.380e+01 2.528e+01 2.893e+01 7.268e+01, threshold=5.056e+01, percent-clipped=1.0 2024-08-20 10:32:41,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4771380.0, ans=0.125 2024-08-20 10:33:07,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4771580.0, ans=0.0 2024-08-20 10:33:25,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4771680.0, ans=0.125 2024-08-20 10:33:37,803 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 3000, loss[loss=0.0978, beats_loss=0.01312, ecapa_loss=0.0001224, whisper_loss=0.08346, over 15448.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0104, ecapa_loss=0.0001391, whisper_loss=0.08911, over 3762603.34 frames. ], batch size: 62, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:33:37,803 INFO [train_multi_KD3.py:1140] (3/4) Computing validation loss 2024-08-20 10:34:13,870 INFO [train_multi_KD3.py:1150] (3/4) Epoch 33, validation on ASR_libri: loss=0.2557, beats_loss=0, ecapa_loss=0.0005125, whisper_loss=0.2506, over 931116.00 frames. 2024-08-20 10:34:36,567 INFO [train_multi_KD3.py:1150] (3/4) Epoch 33, validation on SV_voxceleb1: loss=0.003928, beats_loss=0, ecapa_loss=0.0003928, whisper_loss=0, over 944235.00 frames. 2024-08-20 10:36:13,108 INFO [train_multi_KD3.py:1150] (3/4) Epoch 33, validation on AT_audioset: loss=0.023, beats_loss=0.023, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 10:36:13,112 INFO [train_multi_KD3.py:1156] (3/4) Maximum memory allocated so far is 32606MB 2024-08-20 10:36:14,678 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 27 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 10:36:26,422 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 25 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 10:36:46,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4771980.0, ans=0.125 2024-08-20 10:36:48,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4771980.0, ans=0.1 2024-08-20 10:36:57,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4771980.0, ans=0.0 2024-08-20 10:37:00,329 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 28 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-20 10:37:02,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4772080.0, ans=0.1 2024-08-20 10:37:06,637 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 22 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-20 10:37:27,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4772180.0, ans=0.0 2024-08-20 10:37:29,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4772180.0, ans=0.1 2024-08-20 10:37:33,589 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 3050, loss[loss=0.1189, beats_loss=0.01175, ecapa_loss=0.0001188, whisper_loss=0.106, over 15436.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01045, ecapa_loss=0.0001378, whisper_loss=0.08912, over 3751835.84 frames. ], batch size: 62, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:37:36,500 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2024-08-20 10:37:56,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4772380.0, ans=0.125 2024-08-20 10:37:56,980 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.43 vs. limit=22.5 2024-08-20 10:37:58,955 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.325e+01 2.539e+01 2.982e+01 4.388e+01, threshold=5.078e+01, percent-clipped=0.0 2024-08-20 10:38:01,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4772380.0, ans=0.125 2024-08-20 10:38:02,207 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 24 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 10:38:04,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4772380.0, ans=0.0 2024-08-20 10:38:41,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4772680.0, ans=0.125 2024-08-20 10:38:44,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4772680.0, ans=0.125 2024-08-20 10:38:50,923 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 21 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-20 10:38:55,606 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 3100, loss[loss=0.1066, beats_loss=0.008717, ecapa_loss=0.0001135, whisper_loss=0.09671, over 16858.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01048, ecapa_loss=0.0001375, whisper_loss=0.08967, over 3774703.73 frames. ], batch size: 62, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:39:04,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4772780.0, ans=0.125 2024-08-20 10:39:16,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4772880.0, ans=0.1 2024-08-20 10:39:32,191 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-20 10:39:47,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4773080.0, ans=0.125 2024-08-20 10:39:48,671 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 17 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-20 10:39:49,529 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.08 vs. limit=10.0 2024-08-20 10:39:50,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4773080.0, ans=0.125 2024-08-20 10:40:13,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4773180.0, ans=0.2 2024-08-20 10:40:17,703 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 3150, loss[loss=0.09327, beats_loss=0.01255, ecapa_loss=0.0001548, whisper_loss=0.07917, over 21201.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01047, ecapa_loss=0.0001384, whisper_loss=0.09031, over 3797863.72 frames. ], batch size: 88, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:40:26,880 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 17 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 10:40:30,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4773280.0, ans=0.125 2024-08-20 10:40:31,452 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 10:40:36,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4773380.0, ans=0.125 2024-08-20 10:40:40,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4773380.0, ans=0.0 2024-08-20 10:40:42,188 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.206e+01 2.480e+01 2.972e+01 5.332e+01, threshold=4.960e+01, percent-clipped=1.0 2024-08-20 10:40:43,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4773380.0, ans=0.5 2024-08-20 10:41:19,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4773580.0, ans=0.125 2024-08-20 10:41:26,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4773680.0, ans=0.0 2024-08-20 10:41:26,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4773680.0, ans=0.125 2024-08-20 10:41:26,302 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-08-20 10:41:37,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4773780.0, ans=0.125 2024-08-20 10:41:38,133 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 3200, loss[loss=0.0989, beats_loss=0.01263, ecapa_loss=0.0001153, whisper_loss=0.08512, over 22613.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01049, ecapa_loss=0.0001392, whisper_loss=0.09031, over 3813099.29 frames. ], batch size: 90, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:41:39,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4773780.0, ans=0.125 2024-08-20 10:41:43,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4773780.0, ans=0.0 2024-08-20 10:41:45,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4773780.0, ans=0.125 2024-08-20 10:42:39,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4774080.0, ans=10.0 2024-08-20 10:42:43,966 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 23 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 10:42:59,586 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 3250, loss[loss=0.1076, beats_loss=0.01049, ecapa_loss=0.0001376, whisper_loss=0.09574, over 14444.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01044, ecapa_loss=0.0001388, whisper_loss=0.09109, over 3817993.16 frames. ], batch size: 56, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:43:03,230 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 10:43:10,705 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 33 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-20 10:43:25,946 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.267e+01 2.440e+01 2.710e+01 3.634e+01, threshold=4.881e+01, percent-clipped=0.0 2024-08-20 10:43:28,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4774380.0, ans=0.1 2024-08-20 10:44:08,363 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 20 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-20 10:44:25,480 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 3300, loss[loss=0.08537, beats_loss=0.01277, ecapa_loss=0.0001306, whisper_loss=0.07129, over 20708.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01032, ecapa_loss=0.0001393, whisper_loss=0.092, over 3805844.23 frames. ], batch size: 85, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:44:43,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=4774880.0, ans=0.025 2024-08-20 10:44:45,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4774880.0, ans=0.0 2024-08-20 10:45:09,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4774980.0, ans=0.125 2024-08-20 10:45:22,816 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 8 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-20 10:45:26,210 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 14 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-20 10:45:36,027 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 14 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 10:45:39,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4775180.0, ans=0.2 2024-08-20 10:45:50,685 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 3350, loss[loss=0.09065, beats_loss=0.01041, ecapa_loss=0.0001614, whisper_loss=0.07863, over 18042.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01036, ecapa_loss=0.0001406, whisper_loss=0.09122, over 3803342.59 frames. ], batch size: 75, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:45:56,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4775280.0, ans=0.1 2024-08-20 10:45:58,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4775280.0, ans=0.125 2024-08-20 10:46:14,867 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.32 vs. limit=22.5 2024-08-20 10:46:16,921 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.283e+01 2.507e+01 2.665e+01 5.653e+01, threshold=5.015e+01, percent-clipped=1.0 2024-08-20 10:46:22,096 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 20 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-20 10:46:22,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4775480.0, ans=0.05 2024-08-20 10:46:30,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4775480.0, ans=0.125 2024-08-20 10:46:40,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4775580.0, ans=0.0 2024-08-20 10:47:00,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4775680.0, ans=0.0 2024-08-20 10:47:07,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4775680.0, ans=0.125 2024-08-20 10:47:13,175 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 3400, loss[loss=0.08418, beats_loss=0.01155, ecapa_loss=0.0001287, whisper_loss=0.07134, over 21151.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01035, ecapa_loss=0.0001402, whisper_loss=0.09118, over 3795768.40 frames. ], batch size: 88, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:47:19,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4775780.0, ans=0.2 2024-08-20 10:47:25,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4775780.0, ans=0.2 2024-08-20 10:47:37,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4775880.0, ans=0.125 2024-08-20 10:47:59,738 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.95 vs. limit=22.5 2024-08-20 10:48:12,250 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 20 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-20 10:48:36,718 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 3450, loss[loss=0.1157, beats_loss=0.01058, ecapa_loss=0.0001112, whisper_loss=0.104, over 22571.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01026, ecapa_loss=0.0001399, whisper_loss=0.09194, over 3804339.86 frames. ], batch size: 88, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:48:44,098 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 21 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-20 10:48:58,341 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-20 10:48:58,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4776380.0, ans=0.0 2024-08-20 10:49:02,940 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.189e+01 2.533e+01 2.792e+01 4.546e+01, threshold=5.066e+01, percent-clipped=0.0 2024-08-20 10:49:03,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4776380.0, ans=0.95 2024-08-20 10:49:07,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4776380.0, ans=0.0 2024-08-20 10:49:10,190 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 32 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-20 10:49:27,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4776580.0, ans=0.1 2024-08-20 10:49:32,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4776580.0, ans=0.125 2024-08-20 10:49:34,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4776580.0, ans=0.0 2024-08-20 10:49:40,788 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 23 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-20 10:49:43,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=4776680.0, ans=12.0 2024-08-20 10:49:46,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4776680.0, ans=0.0 2024-08-20 10:49:57,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4776680.0, ans=0.2 2024-08-20 10:49:59,517 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 3500, loss[loss=0.1465, beats_loss=0.00593, ecapa_loss=0.0001547, whisper_loss=0.139, over 19491.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01035, ecapa_loss=0.000141, whisper_loss=0.09118, over 3837234.16 frames. ], batch size: 73, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:50:09,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4776780.0, ans=0.0 2024-08-20 10:50:12,242 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 38 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 10:50:12,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4776780.0, ans=0.125 2024-08-20 10:50:21,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4776880.0, ans=0.1 2024-08-20 10:50:26,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4776880.0, ans=0.2 2024-08-20 10:50:32,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4776980.0, ans=0.0 2024-08-20 10:50:41,378 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 26 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-20 10:50:54,982 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 10:51:05,609 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 12 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-20 10:51:10,720 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 21 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 10:51:25,492 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 3550, loss[loss=0.1249, beats_loss=0.009101, ecapa_loss=0.0001364, whisper_loss=0.1144, over 24531.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01036, ecapa_loss=0.0001422, whisper_loss=0.09083, over 3836111.32 frames. ], batch size: 94, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:51:34,914 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 24 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 10:51:35,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4777280.0, ans=0.04949747468305833 2024-08-20 10:51:40,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4777280.0, ans=0.0 2024-08-20 10:51:48,513 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 26 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-20 10:51:51,925 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 22 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-20 10:51:53,134 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.728e+01 2.256e+01 2.446e+01 2.719e+01 3.472e+01, threshold=4.892e+01, percent-clipped=0.0 2024-08-20 10:51:53,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4777380.0, ans=0.125 2024-08-20 10:52:04,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4777480.0, ans=0.0 2024-08-20 10:52:08,482 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.16 vs. limit=15.0 2024-08-20 10:52:36,846 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 12 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-20 10:52:39,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4777680.0, ans=0.125 2024-08-20 10:52:52,785 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 3600, loss[loss=0.1023, beats_loss=0.01065, ecapa_loss=0.0001336, whisper_loss=0.09035, over 22764.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01042, ecapa_loss=0.0001409, whisper_loss=0.09023, over 3807798.43 frames. ], batch size: 89, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:53:10,134 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 26 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-20 10:53:23,815 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.81 vs. limit=15.0 2024-08-20 10:54:01,036 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 16 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-20 10:54:13,940 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2024-08-20 10:54:51,598 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 3650, loss[loss=0.1092, beats_loss=0.01138, ecapa_loss=0.0001453, whisper_loss=0.09637, over 22618.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01036, ecapa_loss=0.0001409, whisper_loss=0.0906, over 3797396.79 frames. ], batch size: 91, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 10:55:21,217 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.03 vs. limit=10.0 2024-08-20 10:55:33,429 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.703e+01 2.274e+01 2.510e+01 2.811e+01 1.402e+02, threshold=5.019e+01, percent-clipped=2.0 2024-08-20 10:55:38,353 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 13 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 10:55:50,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4778480.0, ans=0.125 2024-08-20 10:55:50,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4778480.0, ans=0.2 2024-08-20 10:56:06,357 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 27 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 10:56:33,571 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 19 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-20 10:56:38,092 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 10:56:45,296 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 27 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-20 10:56:53,787 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 3700, loss[loss=0.1075, beats_loss=0.009077, ecapa_loss=0.0001506, whisper_loss=0.09694, over 21360.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01033, ecapa_loss=0.0001403, whisper_loss=0.0902, over 3781672.54 frames. ], batch size: 88, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 10:57:04,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4778780.0, ans=0.125 2024-08-20 10:57:29,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4778880.0, ans=0.125 2024-08-20 10:57:57,773 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-20 10:58:08,381 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 24 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-20 10:58:41,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4779180.0, ans=0.125 2024-08-20 10:58:44,833 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 3750, loss[loss=0.09079, beats_loss=0.01135, ecapa_loss=0.0001496, whisper_loss=0.07795, over 21956.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01031, ecapa_loss=0.0001408, whisper_loss=0.09046, over 3783895.94 frames. ], batch size: 94, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 10:58:48,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4779280.0, ans=0.1 2024-08-20 10:59:04,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4779280.0, ans=0.125 2024-08-20 10:59:28,242 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.202e+01 2.451e+01 2.797e+01 4.489e+01, threshold=4.903e+01, percent-clipped=0.0 2024-08-20 10:59:51,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4779480.0, ans=0.125 2024-08-20 11:00:03,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4779580.0, ans=0.125 2024-08-20 11:00:08,989 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 17 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-20 11:00:25,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4779680.0, ans=0.125 2024-08-20 11:00:44,439 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 36 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 11:00:46,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4779680.0, ans=0.1 2024-08-20 11:00:50,447 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 3800, loss[loss=0.1194, beats_loss=0.009177, ecapa_loss=0.0001518, whisper_loss=0.1087, over 21794.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01022, ecapa_loss=0.0001412, whisper_loss=0.09124, over 3797785.25 frames. ], batch size: 87, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:00:51,104 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 23 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-20 11:01:01,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4779780.0, ans=0.1 2024-08-20 11:01:11,233 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 17 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-20 11:01:13,522 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-20 11:01:18,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4779880.0, ans=0.125 2024-08-20 11:01:42,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4779980.0, ans=0.025 2024-08-20 11:01:53,983 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.15 vs. limit=6.0 2024-08-20 11:01:55,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4779980.0, ans=0.2 2024-08-20 11:02:00,754 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 11:02:06,587 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.79 vs. limit=10.0 2024-08-20 11:02:08,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4780080.0, ans=0.125 2024-08-20 11:02:23,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4780080.0, ans=0.125 2024-08-20 11:02:57,342 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 3850, loss[loss=0.09338, beats_loss=0.008624, ecapa_loss=0.0001934, whisper_loss=0.08282, over 21400.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01029, ecapa_loss=0.0001412, whisper_loss=0.0904, over 3827861.94 frames. ], batch size: 92, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:03:09,649 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 11:03:19,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4780380.0, ans=0.125 2024-08-20 11:03:26,441 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.48 vs. limit=22.5 2024-08-20 11:03:32,603 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.347e+01 2.624e+01 2.897e+01 4.079e+01, threshold=5.248e+01, percent-clipped=0.0 2024-08-20 11:03:43,684 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 20 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-20 11:04:07,544 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 11:04:13,107 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 23 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 11:04:19,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4780680.0, ans=0.125 2024-08-20 11:04:19,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4780680.0, ans=0.0 2024-08-20 11:04:19,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4780680.0, ans=0.0 2024-08-20 11:04:31,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4780680.0, ans=0.125 2024-08-20 11:04:35,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4780680.0, ans=0.2 2024-08-20 11:04:37,556 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 25 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 11:04:38,197 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.03 vs. limit=15.0 2024-08-20 11:04:40,747 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 3900, loss[loss=0.07903, beats_loss=0.009571, ecapa_loss=0.0001594, whisper_loss=0.06786, over 18363.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01026, ecapa_loss=0.0001415, whisper_loss=0.09022, over 3847931.86 frames. ], batch size: 73, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:04:43,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4780780.0, ans=0.125 2024-08-20 11:04:53,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4780780.0, ans=0.025 2024-08-20 11:04:55,174 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 11:05:34,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4780980.0, ans=0.0 2024-08-20 11:05:43,472 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.33 vs. limit=12.0 2024-08-20 11:05:49,409 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 19 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-20 11:05:51,345 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 24 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-20 11:06:13,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4781180.0, ans=0.125 2024-08-20 11:06:15,890 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 11:06:19,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4781180.0, ans=0.125 2024-08-20 11:06:19,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4781180.0, ans=0.2 2024-08-20 11:06:23,369 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=9.632e+00 2024-08-20 11:06:30,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4781280.0, ans=0.125 2024-08-20 11:06:31,154 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 3950, loss[loss=0.1327, beats_loss=0.007464, ecapa_loss=0.0001416, whisper_loss=0.1238, over 22791.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01029, ecapa_loss=0.0001408, whisper_loss=0.0905, over 3816964.89 frames. ], batch size: 90, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:06:59,524 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=15.0 2024-08-20 11:07:00,756 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 22 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 11:07:03,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4781380.0, ans=0.1 2024-08-20 11:07:07,811 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2024-08-20 11:07:08,290 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.311e+01 2.584e+01 2.900e+01 3.704e+01, threshold=5.168e+01, percent-clipped=0.0 2024-08-20 11:07:26,570 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 32 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-20 11:07:59,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4781680.0, ans=0.125 2024-08-20 11:08:12,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4781680.0, ans=0.1 2024-08-20 11:08:14,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4781780.0, ans=0.125 2024-08-20 11:08:15,340 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 4000, loss[loss=0.1039, beats_loss=0.01143, ecapa_loss=0.0001097, whisper_loss=0.0914, over 22984.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01035, ecapa_loss=0.0001402, whisper_loss=0.09028, over 3829573.16 frames. ], batch size: 88, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:08:18,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4781780.0, ans=0.125 2024-08-20 11:09:31,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4782080.0, ans=0.07 2024-08-20 11:09:45,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4782180.0, ans=0.0 2024-08-20 11:09:54,858 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 25 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-20 11:09:57,110 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 20 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-20 11:10:07,433 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 4050, loss[loss=0.1059, beats_loss=0.008363, ecapa_loss=0.0001394, whisper_loss=0.09617, over 13352.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01041, ecapa_loss=0.0001402, whisper_loss=0.08994, over 3820485.53 frames. ], batch size: 50, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:10:11,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4782280.0, ans=0.0 2024-08-20 11:10:21,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4782280.0, ans=0.2 2024-08-20 11:10:25,977 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 24 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-20 11:10:50,228 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.323e+01 2.567e+01 2.848e+01 3.981e+01, threshold=5.134e+01, percent-clipped=0.0 2024-08-20 11:11:09,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4782480.0, ans=0.1 2024-08-20 11:11:13,993 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 18 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-20 11:11:32,923 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.84 vs. limit=22.5 2024-08-20 11:11:38,882 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.05 vs. limit=15.0 2024-08-20 11:12:03,354 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 20 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-20 11:12:09,989 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 4100, loss[loss=0.06936, beats_loss=0.009878, ecapa_loss=0.0001459, whisper_loss=0.05802, over 14599.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01041, ecapa_loss=0.0001403, whisper_loss=0.09003, over 3827643.74 frames. ], batch size: 58, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:12:35,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4782880.0, ans=0.0 2024-08-20 11:12:41,311 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.44 vs. limit=12.0 2024-08-20 11:12:59,052 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 35 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-20 11:13:20,203 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 29 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-20 11:13:25,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4783180.0, ans=0.1 2024-08-20 11:13:27,188 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 32 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-20 11:13:27,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4783180.0, ans=0.125 2024-08-20 11:13:38,830 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.44 vs. limit=22.5 2024-08-20 11:13:41,078 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 4150, loss[loss=0.1147, beats_loss=0.01151, ecapa_loss=0.0001284, whisper_loss=0.1019, over 22787.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.0001389, whisper_loss=0.09, over 3831182.83 frames. ], batch size: 90, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:14:04,048 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.81 vs. limit=22.5 2024-08-20 11:14:08,591 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 23 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-20 11:14:11,326 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.393e+01 2.670e+01 3.158e+01 1.265e+02, threshold=5.340e+01, percent-clipped=2.0 2024-08-20 11:14:17,010 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 13 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 11:14:18,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4783480.0, ans=0.125 2024-08-20 11:14:28,546 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.70 vs. limit=15.0 2024-08-20 11:14:29,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4783480.0, ans=0.0 2024-08-20 11:14:29,863 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.37 vs. limit=15.0 2024-08-20 11:14:45,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4783580.0, ans=0.125 2024-08-20 11:14:57,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4783680.0, ans=0.0 2024-08-20 11:15:02,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4783680.0, ans=0.035 2024-08-20 11:15:08,170 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 4200, loss[loss=0.1064, beats_loss=0.01064, ecapa_loss=0.0001209, whisper_loss=0.0946, over 23961.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01043, ecapa_loss=0.00014, whisper_loss=0.09052, over 3834281.66 frames. ], batch size: 93, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:15:13,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4783780.0, ans=0.0 2024-08-20 11:15:26,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4783880.0, ans=0.125 2024-08-20 11:15:30,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4783880.0, ans=0.125 2024-08-20 11:15:33,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4783880.0, ans=0.2 2024-08-20 11:15:47,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4783980.0, ans=0.05 2024-08-20 11:16:02,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4784080.0, ans=0.2 2024-08-20 11:16:03,373 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 21 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 11:16:15,325 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.94 vs. limit=12.0 2024-08-20 11:16:37,538 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 4250, loss[loss=0.1045, beats_loss=0.009144, ecapa_loss=0.0001237, whisper_loss=0.09408, over 18638.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01038, ecapa_loss=0.0001394, whisper_loss=0.09052, over 3808828.48 frames. ], batch size: 73, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:16:43,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4784280.0, ans=0.2 2024-08-20 11:16:53,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4784280.0, ans=0.125 2024-08-20 11:17:01,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4784380.0, ans=0.125 2024-08-20 11:17:06,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4784380.0, ans=0.125 2024-08-20 11:17:07,736 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.233e+01 2.477e+01 2.798e+01 4.198e+01, threshold=4.955e+01, percent-clipped=0.0 2024-08-20 11:17:11,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4784480.0, ans=0.125 2024-08-20 11:17:13,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4784480.0, ans=0.0 2024-08-20 11:17:13,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4784480.0, ans=0.125 2024-08-20 11:17:53,919 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 26 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-20 11:17:55,648 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 15 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-20 11:17:57,130 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-20 11:18:05,740 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 4300, loss[loss=0.1133, beats_loss=0.01027, ecapa_loss=0.0001382, whisper_loss=0.1016, over 23449.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0105, ecapa_loss=0.0001393, whisper_loss=0.09033, over 3822799.52 frames. ], batch size: 91, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:18:05,926 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 19 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-20 11:18:13,982 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 11:18:34,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4784880.0, ans=0.125 2024-08-20 11:18:47,697 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 19 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-20 11:18:56,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.84 vs. limit=15.0 2024-08-20 11:19:01,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4785080.0, ans=0.0 2024-08-20 11:19:32,888 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 4350, loss[loss=0.1108, beats_loss=0.0111, ecapa_loss=0.0001399, whisper_loss=0.0983, over 17668.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01055, ecapa_loss=0.0001387, whisper_loss=0.08969, over 3782907.15 frames. ], batch size: 71, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:19:33,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4785280.0, ans=0.125 2024-08-20 11:19:41,136 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=15.0 2024-08-20 11:19:59,320 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 11:20:02,425 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.221e+01 2.524e+01 2.843e+01 5.218e+01, threshold=5.048e+01, percent-clipped=1.0 2024-08-20 11:20:11,488 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 24 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-20 11:20:14,575 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.73 vs. limit=15.0 2024-08-20 11:20:36,004 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.08 vs. limit=15.0 2024-08-20 11:20:55,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4785680.0, ans=0.125 2024-08-20 11:20:55,560 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.09 vs. limit=15.0 2024-08-20 11:21:01,241 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 4400, loss[loss=0.09002, beats_loss=0.01079, ecapa_loss=0.000139, whisper_loss=0.07783, over 16116.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01059, ecapa_loss=0.0001384, whisper_loss=0.08889, over 3783553.71 frames. ], batch size: 67, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:21:05,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4785780.0, ans=0.125 2024-08-20 11:21:21,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4785880.0, ans=0.125 2024-08-20 11:21:25,751 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.96 vs. limit=15.0 2024-08-20 11:21:31,742 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 26 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-20 11:21:32,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4785880.0, ans=0.1 2024-08-20 11:21:57,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4786080.0, ans=0.2 2024-08-20 11:22:12,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4786180.0, ans=0.95 2024-08-20 11:22:17,821 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 11:22:21,806 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 11 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 11:22:30,826 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 4450, loss[loss=0.1078, beats_loss=0.009948, ecapa_loss=0.0001262, whisper_loss=0.09662, over 23091.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01053, ecapa_loss=0.0001384, whisper_loss=0.08897, over 3776603.01 frames. ], batch size: 89, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:22:32,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4786280.0, ans=0.025 2024-08-20 11:22:41,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4786280.0, ans=0.1 2024-08-20 11:22:50,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4786380.0, ans=0.125 2024-08-20 11:22:58,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4786380.0, ans=0.1 2024-08-20 11:23:00,231 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.276e+01 2.505e+01 2.862e+01 6.840e+01, threshold=5.011e+01, percent-clipped=2.0 2024-08-20 11:23:12,633 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 32 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-20 11:23:35,164 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 23 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-20 11:23:49,180 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 29 from LS+wenet, 13 from Vox, 48 fro AS 2024-08-20 11:23:49,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4786680.0, ans=0.125 2024-08-20 11:23:49,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4786680.0, ans=0.125 2024-08-20 11:23:51,708 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 27 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-20 11:23:55,209 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 20 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-20 11:23:57,873 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 4500, loss[loss=0.08859, beats_loss=0.01085, ecapa_loss=0.0001464, whisper_loss=0.07628, over 19495.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01045, ecapa_loss=0.000139, whisper_loss=0.08944, over 3787669.65 frames. ], batch size: 83, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:24:05,988 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 23 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-20 11:24:06,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4786780.0, ans=0.125 2024-08-20 11:24:40,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4786980.0, ans=0.125 2024-08-20 11:24:42,381 WARNING [optim.py:496] (3/4) Scaling gradients by 0.06896068155765533, model_norm_threshold=50.106773376464844 2024-08-20 11:24:42,570 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.994e+04, grad_sumsq=4.994e+04, orig_rms_sq=1.000e+00 2024-08-20 11:24:43,806 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.91 vs. limit=15.0 2024-08-20 11:25:13,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4787180.0, ans=0.0 2024-08-20 11:25:20,427 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.42 vs. limit=10.0 2024-08-20 11:25:25,029 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 4550, loss[loss=0.1034, beats_loss=0.01105, ecapa_loss=0.0001317, whisper_loss=0.09103, over 14099.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01047, ecapa_loss=0.0001398, whisper_loss=0.08971, over 3781470.31 frames. ], batch size: 55, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:25:35,305 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-20 11:25:56,151 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.635e+01 2.231e+01 2.496e+01 2.825e+01 7.266e+02, threshold=4.992e+01, percent-clipped=1.0 2024-08-20 11:26:02,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4787480.0, ans=0.0 2024-08-20 11:26:15,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4787480.0, ans=0.0 2024-08-20 11:26:40,791 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.32 vs. limit=22.5 2024-08-20 11:26:55,708 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 4600, loss[loss=0.1024, beats_loss=0.009231, ecapa_loss=0.000153, whisper_loss=0.09167, over 21227.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0105, ecapa_loss=0.0001397, whisper_loss=0.08968, over 3798530.74 frames. ], batch size: 81, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:27:04,855 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 11:27:44,732 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 31 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-20 11:27:51,749 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.34 vs. limit=15.0 2024-08-20 11:27:55,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4788080.0, ans=0.125 2024-08-20 11:27:58,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4788080.0, ans=0.1 2024-08-20 11:28:09,860 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 11:28:15,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4788180.0, ans=0.2 2024-08-20 11:28:20,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4788180.0, ans=0.1 2024-08-20 11:28:23,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4788180.0, ans=0.125 2024-08-20 11:28:26,278 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 4650, loss[loss=0.09271, beats_loss=0.01211, ecapa_loss=0.0001225, whisper_loss=0.07938, over 16090.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01053, ecapa_loss=0.0001393, whisper_loss=0.0895, over 3836275.42 frames. ], batch size: 66, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:28:34,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4788280.0, ans=0.125 2024-08-20 11:28:45,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4788380.0, ans=0.1 2024-08-20 11:28:56,890 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.578e+01 2.344e+01 2.573e+01 2.789e+01 3.818e+01, threshold=5.145e+01, percent-clipped=0.0 2024-08-20 11:28:59,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4788380.0, ans=0.025 2024-08-20 11:29:08,304 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.36 vs. limit=15.0 2024-08-20 11:29:22,400 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 29 from LS+wenet, 27 from Vox, 18 fro AS 2024-08-20 11:29:25,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4788580.0, ans=0.125 2024-08-20 11:29:44,353 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 19 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-20 11:29:56,188 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.32 vs. limit=10.0 2024-08-20 11:29:56,541 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 4700, loss[loss=0.101, beats_loss=0.009204, ecapa_loss=0.0001462, whisper_loss=0.09037, over 16146.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01046, ecapa_loss=0.0001401, whisper_loss=0.09014, over 3822334.17 frames. ], batch size: 62, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:30:01,823 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 36 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 11:30:07,401 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-20 11:30:27,631 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 11:30:46,942 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-20 11:30:47,857 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2024-08-20 11:30:56,341 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2024-08-20 11:31:22,463 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 4750, loss[loss=0.0875, beats_loss=0.01376, ecapa_loss=0.0001261, whisper_loss=0.07248, over 21922.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01044, ecapa_loss=0.0001409, whisper_loss=0.0894, over 3804568.54 frames. ], batch size: 92, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:31:24,308 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 21 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 11:31:26,079 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 25 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 11:31:52,611 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.256e+01 2.490e+01 2.747e+01 3.725e+01, threshold=4.981e+01, percent-clipped=0.0 2024-08-20 11:32:20,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=4789580.0, ans=15.0 2024-08-20 11:32:55,391 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 4800, loss[loss=0.07093, beats_loss=0.01213, ecapa_loss=8.395e-05, whisper_loss=0.05795, over 15229.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01034, ecapa_loss=0.0001411, whisper_loss=0.08992, over 3775364.02 frames. ], batch size: 58, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:33:13,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4789880.0, ans=0.2 2024-08-20 11:33:48,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=15.0 2024-08-20 11:34:06,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4790080.0, ans=0.0 2024-08-20 11:34:42,510 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 4850, loss[loss=0.07665, beats_loss=0.01193, ecapa_loss=0.000133, whisper_loss=0.06339, over 17753.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0103, ecapa_loss=0.0001406, whisper_loss=0.09015, over 3807807.30 frames. ], batch size: 70, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:35:05,896 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 23 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-20 11:35:19,520 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-20 11:35:27,413 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.728e+01 2.360e+01 2.620e+01 2.940e+01 4.009e+01, threshold=5.240e+01, percent-clipped=0.0 2024-08-20 11:35:46,430 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.475e+00 2024-08-20 11:35:46,625 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=15.0 2024-08-20 11:36:09,043 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 30 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-20 11:36:16,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4790580.0, ans=0.0 2024-08-20 11:36:20,569 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 23 from LS+wenet, 11 from Vox, 18 fro AS 2024-08-20 11:36:30,236 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2024-08-20 11:36:51,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4790680.0, ans=0.0 2024-08-20 11:36:57,851 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 4900, loss[loss=0.0752, beats_loss=0.01202, ecapa_loss=0.0001252, whisper_loss=0.06193, over 22508.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01039, ecapa_loss=0.0001403, whisper_loss=0.08982, over 3841684.84 frames. ], batch size: 93, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:37:05,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4790780.0, ans=0.125 2024-08-20 11:37:10,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4790780.0, ans=0.025 2024-08-20 11:37:28,522 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2024-08-20 11:37:42,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4790880.0, ans=0.125 2024-08-20 11:38:20,629 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.48 vs. limit=15.0 2024-08-20 11:38:27,973 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 27 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-20 11:38:34,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4791080.0, ans=0.125 2024-08-20 11:38:52,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4791180.0, ans=0.0 2024-08-20 11:38:58,384 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.84 vs. limit=12.0 2024-08-20 11:39:11,007 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 4950, loss[loss=0.09891, beats_loss=0.009989, ecapa_loss=0.00014, whisper_loss=0.08752, over 16105.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01035, ecapa_loss=0.0001401, whisper_loss=0.09009, over 3844921.38 frames. ], batch size: 63, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:39:11,246 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 26 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-20 11:39:24,429 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 36 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-20 11:39:34,504 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 25 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-20 11:39:54,447 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.340e+01 2.516e+01 2.750e+01 4.505e+01, threshold=5.032e+01, percent-clipped=0.0 2024-08-20 11:40:12,417 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 17 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-20 11:40:24,237 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.91 vs. limit=22.5 2024-08-20 11:40:24,707 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 17 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-20 11:40:38,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4791580.0, ans=0.125 2024-08-20 11:40:43,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4791580.0, ans=0.1 2024-08-20 11:41:16,279 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 5000, loss[loss=0.1076, beats_loss=0.01025, ecapa_loss=0.0001257, whisper_loss=0.09609, over 23378.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01033, ecapa_loss=0.0001401, whisper_loss=0.09033, over 3837340.11 frames. ], batch size: 91, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:41:21,504 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 11:41:29,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4791780.0, ans=0.0 2024-08-20 11:41:39,538 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.85 vs. limit=15.0 2024-08-20 11:41:54,269 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.908e+00 2024-08-20 11:41:56,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4791880.0, ans=0.125 2024-08-20 11:41:57,899 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 27 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-20 11:42:32,599 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 27 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-20 11:42:36,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4792080.0, ans=0.07 2024-08-20 11:43:06,950 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 29 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 11:43:18,655 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 5050, loss[loss=0.07817, beats_loss=0.0139, ecapa_loss=0.0001431, whisper_loss=0.06284, over 20166.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01033, ecapa_loss=0.0001406, whisper_loss=0.09037, over 3819931.43 frames. ], batch size: 88, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:43:27,103 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-20 11:43:30,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4792280.0, ans=0.125 2024-08-20 11:43:37,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4792280.0, ans=0.0 2024-08-20 11:43:44,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4792380.0, ans=0.0 2024-08-20 11:43:59,123 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.312e+01 2.479e+01 2.896e+01 1.864e+02, threshold=4.958e+01, percent-clipped=1.0 2024-08-20 11:44:01,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4792380.0, ans=0.125 2024-08-20 11:44:21,649 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 22 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 11:44:24,073 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 22 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 11:44:34,666 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.13 vs. limit=22.5 2024-08-20 11:44:37,981 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 14 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-20 11:45:16,441 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 5100, loss[loss=0.1059, beats_loss=0.01021, ecapa_loss=0.0001471, whisper_loss=0.09417, over 14766.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.0001409, whisper_loss=0.09039, over 3787681.14 frames. ], batch size: 59, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:45:44,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4792880.0, ans=0.1 2024-08-20 11:45:49,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4792880.0, ans=0.125 2024-08-20 11:46:00,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4792880.0, ans=0.125 2024-08-20 11:46:42,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4793080.0, ans=0.1 2024-08-20 11:46:59,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4793180.0, ans=0.04949747468305833 2024-08-20 11:47:16,822 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 29 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-20 11:47:19,411 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 5150, loss[loss=0.1048, beats_loss=0.01068, ecapa_loss=0.0001639, whisper_loss=0.09244, over 21461.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0104, ecapa_loss=0.0001413, whisper_loss=0.09082, over 3816116.43 frames. ], batch size: 92, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:47:20,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4793280.0, ans=0.2 2024-08-20 11:47:34,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4793280.0, ans=0.0 2024-08-20 11:47:56,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4793380.0, ans=0.125 2024-08-20 11:48:01,220 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.665e+01 2.291e+01 2.565e+01 2.823e+01 3.881e+01, threshold=5.130e+01, percent-clipped=0.0 2024-08-20 11:48:02,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4793380.0, ans=0.125 2024-08-20 11:48:10,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4793480.0, ans=0.0 2024-08-20 11:48:22,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4793480.0, ans=0.0 2024-08-20 11:48:25,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4793480.0, ans=0.04949747468305833 2024-08-20 11:48:30,856 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 11:48:34,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4793580.0, ans=0.0 2024-08-20 11:49:21,144 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 19 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-20 11:49:23,406 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 5200, loss[loss=0.09517, beats_loss=0.008239, ecapa_loss=0.0001405, whisper_loss=0.08552, over 16267.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01045, ecapa_loss=0.0001407, whisper_loss=0.09073, over 3839277.46 frames. ], batch size: 65, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:49:44,288 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 19 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-20 11:49:53,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4793880.0, ans=0.1 2024-08-20 11:50:18,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4793980.0, ans=0.025 2024-08-20 11:50:32,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4793980.0, ans=0.1 2024-08-20 11:50:44,020 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 17 from LS+wenet, 26 from Vox, 20 fro AS 2024-08-20 11:50:52,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4794080.0, ans=0.0 2024-08-20 11:51:11,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4794180.0, ans=0.07 2024-08-20 11:51:18,025 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 20 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-20 11:51:22,545 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 14 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 11:51:27,316 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 5250, loss[loss=0.09943, beats_loss=0.0112, ecapa_loss=0.0001507, whisper_loss=0.08672, over 21520.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01042, ecapa_loss=0.0001415, whisper_loss=0.09052, over 3850554.23 frames. ], batch size: 90, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:51:29,676 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 21 from LS+wenet, 10 from Vox, 21 fro AS 2024-08-20 11:51:44,378 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 16 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-20 11:51:52,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4794380.0, ans=0.2 2024-08-20 11:51:56,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4794380.0, ans=0.2 2024-08-20 11:52:02,523 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 35 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 11:52:08,188 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 25 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 11:52:11,025 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.300e+01 2.574e+01 2.899e+01 1.239e+02, threshold=5.148e+01, percent-clipped=2.0 2024-08-20 11:53:02,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4794580.0, ans=0.1 2024-08-20 11:53:09,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4794680.0, ans=0.125 2024-08-20 11:53:31,839 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 5300, loss[loss=0.08261, beats_loss=0.008245, ecapa_loss=0.0001154, whisper_loss=0.07321, over 13215.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.0001414, whisper_loss=0.09038, over 3834424.06 frames. ], batch size: 50, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:53:47,109 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 11:53:51,198 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.42 vs. limit=10.0 2024-08-20 11:53:52,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4794780.0, ans=0.2 2024-08-20 11:54:08,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4794880.0, ans=0.125 2024-08-20 11:54:33,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4794980.0, ans=0.1 2024-08-20 11:54:52,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4795080.0, ans=0.125 2024-08-20 11:54:56,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4795080.0, ans=0.125 2024-08-20 11:55:19,396 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 25 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-20 11:55:29,317 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 5350, loss[loss=0.09915, beats_loss=0.01049, ecapa_loss=0.0001414, whisper_loss=0.08725, over 22974.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01043, ecapa_loss=0.000141, whisper_loss=0.08954, over 3840739.71 frames. ], batch size: 91, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:55:36,712 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 28 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 11:55:38,591 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 23 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 11:56:10,462 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.312e+01 2.539e+01 2.746e+01 3.720e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-20 11:56:22,756 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 17 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 11:56:54,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4795580.0, ans=0.125 2024-08-20 11:56:59,950 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.43 vs. limit=12.0 2024-08-20 11:57:10,449 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 11:57:27,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4795680.0, ans=0.125 2024-08-20 11:57:30,443 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 21 from LS+wenet, 8 from Vox, 26 fro AS 2024-08-20 11:57:32,242 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 5400, loss[loss=0.1087, beats_loss=0.01064, ecapa_loss=0.0001305, whisper_loss=0.09671, over 13771.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01044, ecapa_loss=0.0001419, whisper_loss=0.08951, over 3832175.49 frames. ], batch size: 55, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:57:45,515 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 22 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 11:57:49,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4795780.0, ans=0.125 2024-08-20 11:58:00,873 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.56 vs. limit=12.0 2024-08-20 11:58:15,748 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=12.05 vs. limit=12.0 2024-08-20 11:58:43,242 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 15 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-20 11:59:14,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4796180.0, ans=0.2 2024-08-20 11:59:19,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4796180.0, ans=0.2 2024-08-20 11:59:27,052 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=15.0 2024-08-20 11:59:35,897 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 5450, loss[loss=0.0875, beats_loss=0.01064, ecapa_loss=0.0001346, whisper_loss=0.07552, over 17958.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01041, ecapa_loss=0.0001402, whisper_loss=0.08922, over 3822895.82 frames. ], batch size: 73, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:59:54,684 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=11.13 vs. limit=12.0 2024-08-20 12:00:00,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4796280.0, ans=0.1 2024-08-20 12:00:11,997 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 12:00:14,428 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 23 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-20 12:00:18,905 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.281e+01 2.461e+01 2.740e+01 4.925e+01, threshold=4.922e+01, percent-clipped=0.0 2024-08-20 12:00:28,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4796480.0, ans=0.125 2024-08-20 12:00:50,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4796580.0, ans=0.0 2024-08-20 12:01:20,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4796680.0, ans=0.2 2024-08-20 12:01:22,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4796680.0, ans=0.125 2024-08-20 12:01:32,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4796680.0, ans=0.09899494936611666 2024-08-20 12:01:43,297 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 5500, loss[loss=0.09275, beats_loss=0.01151, ecapa_loss=0.0001285, whisper_loss=0.07995, over 23092.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01033, ecapa_loss=0.00014, whisper_loss=0.09003, over 3825917.85 frames. ], batch size: 94, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 12:01:54,052 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 18 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 12:02:41,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4796980.0, ans=0.125 2024-08-20 12:02:43,022 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 32 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-20 12:02:47,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4796980.0, ans=0.125 2024-08-20 12:03:11,690 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 29 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-20 12:03:24,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4797180.0, ans=0.125 2024-08-20 12:03:45,338 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 5550, loss[loss=0.08613, beats_loss=0.01136, ecapa_loss=0.000108, whisper_loss=0.07369, over 17978.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01039, ecapa_loss=0.0001395, whisper_loss=0.08965, over 3808185.77 frames. ], batch size: 71, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 12:03:46,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4797280.0, ans=0.125 2024-08-20 12:03:46,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4797280.0, ans=0.025 2024-08-20 12:03:54,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4797280.0, ans=0.125 2024-08-20 12:04:25,497 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 24 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-20 12:04:31,029 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.285e+01 2.426e+01 2.728e+01 7.340e+01, threshold=4.852e+01, percent-clipped=2.0 2024-08-20 12:04:57,177 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.75 vs. limit=15.0 2024-08-20 12:05:05,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4797580.0, ans=0.1 2024-08-20 12:05:08,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4797580.0, ans=0.1 2024-08-20 12:05:10,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4797580.0, ans=0.0 2024-08-20 12:05:48,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4797680.0, ans=0.1 2024-08-20 12:05:54,002 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 5600, loss[loss=0.1119, beats_loss=0.0109, ecapa_loss=0.0001336, whisper_loss=0.09968, over 23218.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0104, ecapa_loss=0.0001385, whisper_loss=0.08968, over 3803009.08 frames. ], batch size: 93, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 12:06:29,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4797880.0, ans=0.125 2024-08-20 12:06:37,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4797880.0, ans=0.125 2024-08-20 12:07:21,537 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-20 12:08:00,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4798180.0, ans=0.0 2024-08-20 12:08:03,454 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 23 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-20 12:08:08,259 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 5650, loss[loss=0.1015, beats_loss=0.009717, ecapa_loss=0.0001354, whisper_loss=0.09046, over 14926.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01034, ecapa_loss=0.0001399, whisper_loss=0.09031, over 3791348.77 frames. ], batch size: 58, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:08:51,778 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.277e+01 2.490e+01 2.736e+01 3.914e+01, threshold=4.980e+01, percent-clipped=0.0 2024-08-20 12:09:18,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4798480.0, ans=0.2 2024-08-20 12:09:35,226 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 12:09:45,696 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 22 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-20 12:10:10,233 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 5700, loss[loss=0.1051, beats_loss=0.01113, ecapa_loss=0.0001239, whisper_loss=0.09275, over 22231.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01035, ecapa_loss=0.0001397, whisper_loss=0.09006, over 3812098.71 frames. ], batch size: 89, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:10:12,771 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 28 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-20 12:10:29,743 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 17 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-20 12:10:32,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4798880.0, ans=0.125 2024-08-20 12:10:41,997 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 11 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-20 12:10:49,672 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.28 vs. limit=22.5 2024-08-20 12:10:54,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4798980.0, ans=0.125 2024-08-20 12:11:42,727 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 12:11:51,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4799180.0, ans=0.2 2024-08-20 12:12:07,828 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 19 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-20 12:12:09,997 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 5750, loss[loss=0.07892, beats_loss=0.009834, ecapa_loss=0.0001612, whisper_loss=0.06748, over 19298.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01047, ecapa_loss=0.0001395, whisper_loss=0.08969, over 3848649.15 frames. ], batch size: 81, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:12:34,675 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.05 vs. limit=15.0 2024-08-20 12:12:51,240 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.328e+01 2.562e+01 2.759e+01 3.925e+01, threshold=5.124e+01, percent-clipped=0.0 2024-08-20 12:12:56,197 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 12:13:00,560 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 29 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-20 12:13:02,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4799480.0, ans=0.0 2024-08-20 12:13:27,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.56 vs. limit=15.0 2024-08-20 12:13:51,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4799680.0, ans=0.125 2024-08-20 12:14:10,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4799680.0, ans=0.025 2024-08-20 12:14:13,752 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 5800, loss[loss=0.09982, beats_loss=0.00882, ecapa_loss=0.0001273, whisper_loss=0.08973, over 18650.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01045, ecapa_loss=0.0001408, whisper_loss=0.08951, over 3873701.46 frames. ], batch size: 70, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:14:22,197 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 25 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-20 12:14:24,817 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 21 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-20 12:14:51,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4799880.0, ans=0.125 2024-08-20 12:15:16,902 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.13 vs. limit=10.0 2024-08-20 12:15:27,754 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 25 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-20 12:15:51,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4800080.0, ans=0.07 2024-08-20 12:16:01,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=4800180.0, ans=15.0 2024-08-20 12:16:18,528 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 5850, loss[loss=0.1023, beats_loss=0.01068, ecapa_loss=0.0001342, whisper_loss=0.0903, over 21056.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01046, ecapa_loss=0.000139, whisper_loss=0.09002, over 3894349.98 frames. ], batch size: 85, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:16:54,427 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 16 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-20 12:16:58,531 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.301e+01 2.506e+01 2.861e+01 6.399e+01, threshold=5.013e+01, percent-clipped=1.0 2024-08-20 12:17:07,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4800480.0, ans=0.1 2024-08-20 12:17:13,171 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 24 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 12:17:15,217 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=15.0 2024-08-20 12:17:28,021 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.924e-02 2024-08-20 12:17:29,003 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 25 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 12:17:38,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4800580.0, ans=0.1 2024-08-20 12:18:04,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4800680.0, ans=0.2 2024-08-20 12:18:15,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4800780.0, ans=0.0 2024-08-20 12:18:16,848 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 5900, loss[loss=0.1106, beats_loss=0.009887, ecapa_loss=0.0001464, whisper_loss=0.09924, over 17311.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01043, ecapa_loss=0.0001398, whisper_loss=0.08981, over 3863043.86 frames. ], batch size: 68, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:18:18,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4800780.0, ans=0.0 2024-08-20 12:18:52,074 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 31 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 12:18:58,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4800880.0, ans=10.0 2024-08-20 12:19:13,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4800980.0, ans=0.5 2024-08-20 12:19:16,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4800980.0, ans=0.09899494936611666 2024-08-20 12:19:26,133 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.55 vs. limit=15.0 2024-08-20 12:19:26,156 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.70 vs. limit=22.5 2024-08-20 12:19:47,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=4801080.0, ans=0.025 2024-08-20 12:19:49,726 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.28 vs. limit=12.0 2024-08-20 12:20:15,125 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 5950, loss[loss=0.1077, beats_loss=0.0103, ecapa_loss=0.0001596, whisper_loss=0.09583, over 22372.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01041, ecapa_loss=0.0001398, whisper_loss=0.08958, over 3847564.10 frames. ], batch size: 93, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:20:22,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4801280.0, ans=0.125 2024-08-20 12:20:47,820 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 24 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 12:20:55,763 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.723e+01 2.276e+01 2.504e+01 2.875e+01 3.990e+01, threshold=5.008e+01, percent-clipped=0.0 2024-08-20 12:21:02,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4801480.0, ans=0.125 2024-08-20 12:21:03,270 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=22.5 2024-08-20 12:21:06,035 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 24 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-20 12:21:58,317 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 25 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-20 12:22:09,481 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 6000, loss[loss=0.1032, beats_loss=0.01034, ecapa_loss=0.0001145, whisper_loss=0.09175, over 18239.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01034, ecapa_loss=0.0001397, whisper_loss=0.08972, over 3833214.95 frames. ], batch size: 69, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:22:09,481 INFO [train_multi_KD3.py:1140] (3/4) Computing validation loss 2024-08-20 12:22:45,772 INFO [train_multi_KD3.py:1150] (3/4) Epoch 33, validation on ASR_libri: loss=0.254, beats_loss=0, ecapa_loss=0.0005123, whisper_loss=0.2489, over 931116.00 frames. 2024-08-20 12:23:08,937 INFO [train_multi_KD3.py:1150] (3/4) Epoch 33, validation on SV_voxceleb1: loss=0.003913, beats_loss=0, ecapa_loss=0.0003913, whisper_loss=0, over 944235.00 frames. 2024-08-20 12:24:20,834 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([5.4145e-10, 1.1671e-02, 1.2212e-03, 3.0790e+00, 1.0758e-03, 5.9943e-02, 3.3132e-02, 5.0175e-02], device='cuda:3') 2024-08-20 12:24:44,164 INFO [train_multi_KD3.py:1150] (3/4) Epoch 33, validation on AT_audioset: loss=0.02298, beats_loss=0.02298, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 12:24:44,168 INFO [train_multi_KD3.py:1156] (3/4) Maximum memory allocated so far is 32606MB 2024-08-20 12:24:48,737 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.13 vs. limit=22.5 2024-08-20 12:24:49,629 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 15 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-20 12:24:58,094 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 16 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 12:25:14,501 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=15.0 2024-08-20 12:25:18,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4801880.0, ans=0.0 2024-08-20 12:25:21,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4801880.0, ans=0.125 2024-08-20 12:25:22,322 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 28 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 12:25:24,065 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 23 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-20 12:25:36,463 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.11 vs. limit=22.5 2024-08-20 12:26:02,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4802080.0, ans=0.125 2024-08-20 12:26:11,444 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 16 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-20 12:26:19,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4802180.0, ans=0.125 2024-08-20 12:26:20,646 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 15 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-20 12:26:36,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4802280.0, ans=0.1 2024-08-20 12:26:36,932 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 6050, loss[loss=0.1112, beats_loss=0.0105, ecapa_loss=0.0001341, whisper_loss=0.09932, over 22284.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01049, ecapa_loss=0.0001395, whisper_loss=0.08876, over 3809028.19 frames. ], batch size: 88, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:26:51,253 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 20 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-20 12:26:58,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4802380.0, ans=0.125 2024-08-20 12:27:07,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4802380.0, ans=0.125 2024-08-20 12:27:14,864 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 20 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 12:27:17,769 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.729e+01 2.267e+01 2.529e+01 2.771e+01 4.356e+01, threshold=5.058e+01, percent-clipped=0.0 2024-08-20 12:27:54,949 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2024-08-20 12:28:23,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4802680.0, ans=0.125 2024-08-20 12:28:35,570 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 6100, loss[loss=0.1202, beats_loss=0.01023, ecapa_loss=0.0001193, whisper_loss=0.1088, over 22926.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01051, ecapa_loss=0.0001393, whisper_loss=0.08906, over 3837686.09 frames. ], batch size: 87, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:28:40,156 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 17 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-20 12:29:01,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4802880.0, ans=0.125 2024-08-20 12:29:28,830 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 15 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 12:29:35,597 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 35 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 12:29:43,641 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.22 vs. limit=12.0 2024-08-20 12:29:50,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4803080.0, ans=0.125 2024-08-20 12:30:04,333 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.65 vs. limit=10.0 2024-08-20 12:30:09,379 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=15.0 2024-08-20 12:30:20,327 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2024-08-20 12:30:22,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4803180.0, ans=0.1 2024-08-20 12:30:29,501 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 6150, loss[loss=0.1081, beats_loss=0.01027, ecapa_loss=0.0001311, whisper_loss=0.09648, over 21485.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01057, ecapa_loss=0.0001379, whisper_loss=0.08871, over 3824873.43 frames. ], batch size: 87, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:30:51,246 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 17 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-20 12:31:07,004 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.293e+01 2.457e+01 2.787e+01 2.276e+02, threshold=4.913e+01, percent-clipped=4.0 2024-08-20 12:31:14,413 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-20 12:31:25,128 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 31 from LS+wenet, 12 from Vox, 49 fro AS 2024-08-20 12:31:26,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4803480.0, ans=0.125 2024-08-20 12:31:55,897 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 14 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-20 12:32:00,480 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 24 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-20 12:32:23,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4803680.0, ans=0.125 2024-08-20 12:32:26,735 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 6200, loss[loss=0.1053, beats_loss=0.0127, ecapa_loss=0.0001188, whisper_loss=0.09137, over 23661.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01064, ecapa_loss=0.0001367, whisper_loss=0.08897, over 3825998.92 frames. ], batch size: 91, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:32:33,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4803780.0, ans=0.1 2024-08-20 12:32:43,271 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.09 vs. limit=15.0 2024-08-20 12:32:50,255 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-20 12:33:00,557 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 26 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-20 12:33:27,306 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.10 vs. limit=22.5 2024-08-20 12:33:33,449 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 15 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-20 12:33:41,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4804080.0, ans=0.2 2024-08-20 12:33:45,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4804080.0, ans=0.125 2024-08-20 12:34:10,576 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 12:34:21,877 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 6250, loss[loss=0.1013, beats_loss=0.009073, ecapa_loss=0.0001559, whisper_loss=0.09065, over 18158.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01073, ecapa_loss=0.0001375, whisper_loss=0.08855, over 3837502.73 frames. ], batch size: 74, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:34:26,678 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.25 vs. limit=15.0 2024-08-20 12:34:31,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4804280.0, ans=0.125 2024-08-20 12:34:55,654 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-20 12:35:02,554 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.372e+01 2.624e+01 2.911e+01 4.545e+01, threshold=5.248e+01, percent-clipped=0.0 2024-08-20 12:35:10,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4804480.0, ans=0.125 2024-08-20 12:35:35,602 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 28 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-20 12:35:36,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4804580.0, ans=0.125 2024-08-20 12:35:38,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4804580.0, ans=0.2 2024-08-20 12:36:17,588 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 6300, loss[loss=0.09611, beats_loss=0.01067, ecapa_loss=0.0001422, whisper_loss=0.08402, over 21892.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01068, ecapa_loss=0.0001384, whisper_loss=0.08941, over 3843956.53 frames. ], batch size: 90, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:36:44,018 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.92 vs. limit=10.0 2024-08-20 12:36:46,219 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.36 vs. limit=15.0 2024-08-20 12:36:50,557 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-20 12:36:58,331 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 20 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-20 12:37:01,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4804880.0, ans=0.125 2024-08-20 12:37:18,106 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.35 vs. limit=5.0 2024-08-20 12:37:25,303 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 18 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 12:37:36,831 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 17 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-20 12:37:45,643 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 32 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-20 12:38:13,162 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 6350, loss[loss=0.07793, beats_loss=0.01361, ecapa_loss=8.391e-05, whisper_loss=0.06348, over 15511.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01063, ecapa_loss=0.0001381, whisper_loss=0.08982, over 3865293.22 frames. ], batch size: 59, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:38:14,881 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.57 vs. limit=15.0 2024-08-20 12:38:26,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4805280.0, ans=0.125 2024-08-20 12:38:33,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4805280.0, ans=0.0 2024-08-20 12:38:53,960 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.680e+01 2.243e+01 2.449e+01 2.846e+01 7.911e+01, threshold=4.899e+01, percent-clipped=2.0 2024-08-20 12:39:05,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4805480.0, ans=0.125 2024-08-20 12:39:48,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4805680.0, ans=0.0 2024-08-20 12:40:15,138 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 6400, loss[loss=0.08056, beats_loss=0.01191, ecapa_loss=0.0001054, whisper_loss=0.0676, over 18804.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01063, ecapa_loss=0.0001384, whisper_loss=0.08906, over 3844117.55 frames. ], batch size: 72, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:40:25,216 WARNING [optim.py:496] (3/4) Scaling gradients by 0.015770763158798218, model_norm_threshold=48.98588180541992 2024-08-20 12:40:25,384 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.0.self_attn_weights.in_proj.bias with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.332e+06, grad_sumsq=1.479e+05, orig_rms_sq=9.003e+00 2024-08-20 12:40:27,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4805780.0, ans=0.0 2024-08-20 12:40:38,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=4805880.0, ans=10.0 2024-08-20 12:41:01,889 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 18 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-20 12:41:07,325 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 25 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-20 12:41:12,163 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 16 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-20 12:41:19,053 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.89 vs. limit=15.0 2024-08-20 12:42:10,951 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 6450, loss[loss=0.09934, beats_loss=0.01056, ecapa_loss=0.0001458, whisper_loss=0.08733, over 15330.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01074, ecapa_loss=0.0001375, whisper_loss=0.08981, over 3824878.15 frames. ], batch size: 61, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:42:18,344 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.53 vs. limit=12.0 2024-08-20 12:42:37,982 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.28 vs. limit=15.0 2024-08-20 12:42:45,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4806380.0, ans=0.0 2024-08-20 12:42:47,831 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.200e+00 2024-08-20 12:42:48,121 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.75 vs. limit=15.0 2024-08-20 12:42:58,439 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.273e+01 2.543e+01 2.928e+01 3.106e+03, threshold=5.086e+01, percent-clipped=1.0 2024-08-20 12:43:16,315 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 21 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-20 12:43:18,442 WARNING [optim.py:496] (3/4) Scaling gradients by 0.07709788531064987, model_norm_threshold=50.86475372314453 2024-08-20 12:43:18,613 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.1.norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.952e+04, grad_sumsq=6.952e+04, orig_rms_sq=1.000e+00 2024-08-20 12:43:37,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4806580.0, ans=0.0 2024-08-20 12:44:07,342 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 6500, loss[loss=0.09608, beats_loss=0.01214, ecapa_loss=0.0001385, whisper_loss=0.08255, over 19769.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01068, ecapa_loss=0.0001377, whisper_loss=0.09033, over 3791766.11 frames. ], batch size: 80, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:44:11,072 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.24 vs. limit=15.0 2024-08-20 12:44:21,379 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 34 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 12:44:54,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=4806980.0, ans=0.05 2024-08-20 12:45:10,513 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 24 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 12:45:10,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4807080.0, ans=0.0 2024-08-20 12:45:31,341 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 23 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 12:45:35,338 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 31 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-20 12:45:37,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4807180.0, ans=0.125 2024-08-20 12:45:37,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4807180.0, ans=0.125 2024-08-20 12:45:40,568 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 6550, loss[loss=0.1169, beats_loss=0.007234, ecapa_loss=0.0001519, whisper_loss=0.1082, over 15192.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01061, ecapa_loss=0.0001389, whisper_loss=0.09013, over 3833592.68 frames. ], batch size: 58, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:46:10,082 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.595e+01 2.299e+01 2.491e+01 2.817e+01 6.597e+02, threshold=4.982e+01, percent-clipped=1.0 2024-08-20 12:46:20,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=4807480.0, ans=6.0 2024-08-20 12:46:36,839 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 24 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-20 12:46:40,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4807580.0, ans=0.07 2024-08-20 12:46:48,794 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.67 vs. limit=15.0 2024-08-20 12:47:11,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4807680.0, ans=0.125 2024-08-20 12:47:16,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4807680.0, ans=0.0 2024-08-20 12:47:24,501 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2024-08-20 12:47:27,986 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-20 12:47:31,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4807780.0, ans=0.025 2024-08-20 12:47:32,730 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 6600, loss[loss=0.09192, beats_loss=0.009097, ecapa_loss=0.0001411, whisper_loss=0.08141, over 16011.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01055, ecapa_loss=0.00014, whisper_loss=0.09027, over 3833813.74 frames. ], batch size: 60, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:47:33,001 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 23 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-20 12:47:35,699 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 21 from LS+wenet, 20 from Vox, 50 fro AS 2024-08-20 12:47:56,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4807880.0, ans=0.0 2024-08-20 12:48:14,469 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 36 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-20 12:48:20,013 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-20 12:48:29,643 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-20 12:48:38,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4807980.0, ans=0.125 2024-08-20 12:48:47,471 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.77 vs. limit=15.0 2024-08-20 12:49:02,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4808080.0, ans=0.0 2024-08-20 12:49:13,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4808180.0, ans=0.125 2024-08-20 12:49:27,102 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.37 vs. limit=15.0 2024-08-20 12:49:28,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4808180.0, ans=0.1 2024-08-20 12:49:36,563 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 6650, loss[loss=0.09128, beats_loss=0.01121, ecapa_loss=0.0001019, whisper_loss=0.07905, over 20624.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01049, ecapa_loss=0.0001402, whisper_loss=0.0906, over 3874646.09 frames. ], batch size: 83, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:49:38,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4808280.0, ans=0.0 2024-08-20 12:50:06,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4808380.0, ans=0.125 2024-08-20 12:50:07,921 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 18 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-20 12:50:15,747 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.590e+01 2.324e+01 2.596e+01 3.081e+01 4.862e+01, threshold=5.192e+01, percent-clipped=0.0 2024-08-20 12:50:31,734 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.90 vs. limit=10.0 2024-08-20 12:51:00,691 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.85 vs. limit=22.5 2024-08-20 12:51:17,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4808680.0, ans=0.0 2024-08-20 12:51:18,195 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 32 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-20 12:51:30,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4808680.0, ans=0.125 2024-08-20 12:51:33,289 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 6700, loss[loss=0.1035, beats_loss=0.01061, ecapa_loss=0.0001552, whisper_loss=0.0913, over 23134.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.0001404, whisper_loss=0.09061, over 3906352.72 frames. ], batch size: 94, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:52:23,013 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 27 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 12:52:37,082 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 25 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-20 12:52:50,195 INFO [train_multi_KD3.py:845] (3/4) A total of 97 cuts. 32 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-20 12:52:57,454 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 26 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-20 12:53:01,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4809180.0, ans=0.0 2024-08-20 12:53:05,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4809280.0, ans=0.2 2024-08-20 12:53:05,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4809280.0, ans=0.125 2024-08-20 12:53:05,835 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 6750, loss[loss=0.1132, beats_loss=0.008823, ecapa_loss=0.0001871, whisper_loss=0.1025, over 22047.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01042, ecapa_loss=0.0001412, whisper_loss=0.09113, over 3923388.47 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:53:06,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4809280.0, ans=0.125 2024-08-20 12:53:22,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4809380.0, ans=0.2 2024-08-20 12:53:32,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4809380.0, ans=0.125 2024-08-20 12:53:35,108 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.304e+01 2.494e+01 2.775e+01 4.602e+01, threshold=4.987e+01, percent-clipped=0.0 2024-08-20 12:53:56,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4809580.0, ans=0.125 2024-08-20 12:53:57,835 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 13 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-20 12:54:17,629 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=15.0 2024-08-20 12:54:19,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4809680.0, ans=0.05 2024-08-20 12:54:32,338 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 6800, loss[loss=0.1122, beats_loss=0.008695, ecapa_loss=0.0001692, whisper_loss=0.1018, over 22121.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01036, ecapa_loss=0.0001416, whisper_loss=0.09146, over 3927791.52 frames. ], batch size: 91, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:54:52,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4809880.0, ans=0.1 2024-08-20 12:55:01,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4809880.0, ans=0.125 2024-08-20 12:55:33,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4810080.0, ans=0.125 2024-08-20 12:55:42,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4810180.0, ans=0.1 2024-08-20 12:55:42,371 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.58 vs. limit=6.0 2024-08-20 12:55:43,293 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-20 12:55:48,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4810180.0, ans=0.0 2024-08-20 12:55:59,939 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 6850, loss[loss=0.06348, beats_loss=0.01222, ecapa_loss=0.0001267, whisper_loss=0.04999, over 15691.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0104, ecapa_loss=0.0001412, whisper_loss=0.0912, over 3924788.83 frames. ], batch size: 63, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:56:00,578 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 21 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-20 12:56:01,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4810280.0, ans=0.2 2024-08-20 12:56:21,681 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 12:56:28,941 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.376e+01 2.516e+01 2.843e+01 1.582e+02, threshold=5.033e+01, percent-clipped=2.0 2024-08-20 12:56:57,149 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.49 vs. limit=15.0 2024-08-20 12:57:00,008 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 16 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 12:57:16,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4810680.0, ans=0.125 2024-08-20 12:57:30,363 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 6900, loss[loss=0.1188, beats_loss=0.008629, ecapa_loss=0.0001405, whisper_loss=0.1088, over 15905.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01042, ecapa_loss=0.0001419, whisper_loss=0.09026, over 3868637.67 frames. ], batch size: 61, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:57:34,816 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 40 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-20 12:57:36,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4810780.0, ans=0.125 2024-08-20 12:57:51,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4810880.0, ans=0.0 2024-08-20 12:57:52,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4810880.0, ans=0.125 2024-08-20 12:57:57,312 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 12:57:57,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4810880.0, ans=0.0 2024-08-20 12:58:09,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4810980.0, ans=0.0 2024-08-20 12:58:14,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4810980.0, ans=0.125 2024-08-20 12:58:49,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4811180.0, ans=0.2 2024-08-20 12:58:51,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4811180.0, ans=0.125 2024-08-20 12:58:59,273 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 6950, loss[loss=0.1152, beats_loss=0.009218, ecapa_loss=0.000157, whisper_loss=0.1044, over 16948.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01043, ecapa_loss=0.0001417, whisper_loss=0.09036, over 3876920.57 frames. ], batch size: 69, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:59:03,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4811280.0, ans=0.125 2024-08-20 12:59:30,815 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.235e+01 2.363e+01 2.831e+01 5.596e+01, threshold=4.726e+01, percent-clipped=1.0 2024-08-20 13:00:13,558 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.21 vs. limit=12.0 2024-08-20 13:00:13,588 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.90 vs. limit=15.0 2024-08-20 13:00:13,859 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.84 vs. limit=15.0 2024-08-20 13:00:15,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4811680.0, ans=0.1 2024-08-20 13:00:15,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4811680.0, ans=0.0 2024-08-20 13:00:17,568 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.13 vs. limit=22.5 2024-08-20 13:00:19,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4811680.0, ans=0.0 2024-08-20 13:00:29,827 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 7000, loss[loss=0.1066, beats_loss=0.01196, ecapa_loss=0.0001321, whisper_loss=0.09331, over 22269.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0104, ecapa_loss=0.0001417, whisper_loss=0.08989, over 3851645.13 frames. ], batch size: 91, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:00:34,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4811780.0, ans=0.1 2024-08-20 13:00:48,686 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.10 vs. limit=12.0 2024-08-20 13:01:07,107 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.68 vs. limit=15.0 2024-08-20 13:01:17,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4811980.0, ans=0.125 2024-08-20 13:01:18,702 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-20 13:01:34,811 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 13 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-20 13:01:41,725 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.32 vs. limit=15.0 2024-08-20 13:01:43,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4812180.0, ans=0.125 2024-08-20 13:01:57,582 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 14 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-20 13:01:59,336 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 7050, loss[loss=0.07905, beats_loss=0.01184, ecapa_loss=0.0001265, whisper_loss=0.06594, over 15663.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01054, ecapa_loss=0.0001408, whisper_loss=0.08975, over 3866250.54 frames. ], batch size: 61, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:01:59,558 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 23 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-20 13:02:00,981 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 13 from LS+wenet, 29 from Vox, 22 fro AS 2024-08-20 13:02:05,191 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 27 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-20 13:02:07,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4812280.0, ans=0.125 2024-08-20 13:02:31,557 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.241e+01 2.463e+01 2.779e+01 3.668e+01, threshold=4.925e+01, percent-clipped=0.0 2024-08-20 13:02:35,484 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.77 vs. limit=15.0 2024-08-20 13:02:37,763 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 16 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-20 13:02:44,413 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 26 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-20 13:02:54,511 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 23 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-20 13:03:10,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4812580.0, ans=0.09899494936611666 2024-08-20 13:03:19,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4812680.0, ans=0.125 2024-08-20 13:03:31,199 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 7100, loss[loss=0.1016, beats_loss=0.009832, ecapa_loss=0.0001318, whisper_loss=0.09043, over 16667.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01051, ecapa_loss=0.0001414, whisper_loss=0.09049, over 3856027.57 frames. ], batch size: 64, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:03:39,000 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 16 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-20 13:03:42,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4812780.0, ans=0.125 2024-08-20 13:03:52,880 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 31 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-20 13:04:03,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4812880.0, ans=0.0 2024-08-20 13:04:07,813 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.23 vs. limit=5.0 2024-08-20 13:04:15,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4812980.0, ans=0.0 2024-08-20 13:04:22,431 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 21 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-20 13:04:25,473 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.65 vs. limit=22.5 2024-08-20 13:04:32,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4813080.0, ans=0.125 2024-08-20 13:05:04,633 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 7150, loss[loss=0.08545, beats_loss=0.009524, ecapa_loss=0.00018, whisper_loss=0.07413, over 15301.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01055, ecapa_loss=0.0001417, whisper_loss=0.08941, over 3816521.61 frames. ], batch size: 66, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:05:33,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4813380.0, ans=0.0 2024-08-20 13:05:35,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4813380.0, ans=0.125 2024-08-20 13:05:36,029 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.222e+01 2.476e+01 2.874e+01 3.378e+02, threshold=4.952e+01, percent-clipped=1.0 2024-08-20 13:05:39,003 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 13:05:41,850 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-20 13:06:16,744 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 13:06:20,235 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 25 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-20 13:06:36,020 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 7200, loss[loss=0.0892, beats_loss=0.00802, ecapa_loss=0.0001224, whisper_loss=0.07995, over 18088.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01063, ecapa_loss=0.0001409, whisper_loss=0.0886, over 3855211.90 frames. ], batch size: 70, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:06:40,104 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 26 from LS+wenet, 12 from Vox, 14 fro AS 2024-08-20 13:06:55,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4813880.0, ans=0.125 2024-08-20 13:07:13,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4813980.0, ans=0.2 2024-08-20 13:07:18,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4813980.0, ans=0.2 2024-08-20 13:07:23,555 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 27 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-20 13:07:24,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4813980.0, ans=0.0 2024-08-20 13:07:43,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4814080.0, ans=0.125 2024-08-20 13:07:47,077 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0 2024-08-20 13:07:56,320 WARNING [optim.py:496] (3/4) Scaling gradients by 0.09890901297330856, model_norm_threshold=49.522193908691406 2024-08-20 13:07:56,486 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.024e+04, grad_sumsq=9.178e+03, orig_rms_sq=3.294e+00 2024-08-20 13:08:03,063 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 20 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-20 13:08:10,563 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 7250, loss[loss=0.09295, beats_loss=0.01195, ecapa_loss=0.0001223, whisper_loss=0.07977, over 23291.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01055, ecapa_loss=0.0001399, whisper_loss=0.08877, over 3851195.53 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:08:20,408 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 29 from LS+wenet, 33 from Vox, 27 fro AS 2024-08-20 13:08:41,433 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.541e+01 2.256e+01 2.635e+01 2.926e+01 5.007e+02, threshold=5.270e+01, percent-clipped=2.0 2024-08-20 13:09:29,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4814680.0, ans=0.0 2024-08-20 13:09:29,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4814680.0, ans=0.0 2024-08-20 13:09:31,365 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 14 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 13:09:39,710 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 7300, loss[loss=0.1022, beats_loss=0.01186, ecapa_loss=0.000127, whisper_loss=0.08905, over 19909.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01052, ecapa_loss=0.0001413, whisper_loss=0.08893, over 3867927.77 frames. ], batch size: 77, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:09:57,747 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.11 vs. limit=15.0 2024-08-20 13:10:00,685 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 18 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-20 13:10:24,234 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 26 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 13:10:42,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4815080.0, ans=0.1 2024-08-20 13:10:55,629 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 21 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-20 13:11:07,672 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 31 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-20 13:11:08,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4815180.0, ans=0.0 2024-08-20 13:11:10,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4815180.0, ans=0.125 2024-08-20 13:11:12,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4815180.0, ans=0.125 2024-08-20 13:11:14,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4815180.0, ans=0.2 2024-08-20 13:11:17,557 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 7350, loss[loss=0.1211, beats_loss=0.009104, ecapa_loss=0.0001436, whisper_loss=0.1106, over 23051.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01047, ecapa_loss=0.0001407, whisper_loss=0.08953, over 3905706.06 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:11:20,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4815280.0, ans=0.125 2024-08-20 13:11:33,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4815280.0, ans=0.0 2024-08-20 13:11:39,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4815380.0, ans=0.1 2024-08-20 13:11:52,796 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.308e+01 2.500e+01 2.769e+01 3.790e+01, threshold=5.001e+01, percent-clipped=0.0 2024-08-20 13:11:56,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4815380.0, ans=0.5 2024-08-20 13:12:04,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4815480.0, ans=0.1 2024-08-20 13:12:19,397 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 19 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 13:12:32,841 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 18 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 13:12:44,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4815680.0, ans=0.125 2024-08-20 13:12:50,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4815680.0, ans=0.125 2024-08-20 13:13:01,553 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 7400, loss[loss=0.1021, beats_loss=0.007467, ecapa_loss=0.0001748, whisper_loss=0.09291, over 19491.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01049, ecapa_loss=0.0001398, whisper_loss=0.08958, over 3896823.96 frames. ], batch size: 77, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:13:15,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4815780.0, ans=0.125 2024-08-20 13:13:26,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4815880.0, ans=0.09899494936611666 2024-08-20 13:13:33,795 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 18 from LS+wenet, 19 from Vox, 51 fro AS 2024-08-20 13:13:38,070 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 21 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-20 13:13:44,727 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.58 vs. limit=15.0 2024-08-20 13:13:52,113 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 23 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-20 13:13:55,803 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 16 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 13:14:03,523 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 18 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-20 13:14:38,170 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 7450, loss[loss=0.09872, beats_loss=0.01001, ecapa_loss=0.0001561, whisper_loss=0.08716, over 15133.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01041, ecapa_loss=0.0001395, whisper_loss=0.08987, over 3863053.78 frames. ], batch size: 62, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:14:50,255 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-20 13:14:52,449 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 24 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 13:15:05,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4816380.0, ans=0.125 2024-08-20 13:15:06,661 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-20 13:15:08,900 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 29 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 13:15:11,844 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.691e+01 2.199e+01 2.442e+01 2.667e+01 5.088e+01, threshold=4.883e+01, percent-clipped=1.0 2024-08-20 13:15:40,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4816580.0, ans=0.2 2024-08-20 13:15:47,647 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 24 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-20 13:16:03,935 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 17 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-20 13:16:06,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4816680.0, ans=0.07 2024-08-20 13:16:08,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4816680.0, ans=0.1 2024-08-20 13:16:13,325 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 19 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 13:16:19,362 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 7500, loss[loss=0.1059, beats_loss=0.0113, ecapa_loss=0.0001154, whisper_loss=0.09345, over 23401.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01043, ecapa_loss=0.0001399, whisper_loss=0.08987, over 3848696.29 frames. ], batch size: 89, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:16:24,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4816780.0, ans=0.0 2024-08-20 13:16:34,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4816780.0, ans=0.125 2024-08-20 13:16:41,681 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 22 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-20 13:17:07,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4816980.0, ans=0.125 2024-08-20 13:17:09,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4816980.0, ans=0.125 2024-08-20 13:17:14,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4816980.0, ans=0.1 2024-08-20 13:17:31,944 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 25 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 13:17:45,163 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 15 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-20 13:18:01,090 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 7550, loss[loss=0.1025, beats_loss=0.009286, ecapa_loss=0.0001451, whisper_loss=0.09175, over 17493.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01046, ecapa_loss=0.0001407, whisper_loss=0.08962, over 3852423.82 frames. ], batch size: 72, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:18:08,479 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 27 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-20 13:18:20,612 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.84 vs. limit=10.0 2024-08-20 13:18:21,873 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 13 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-20 13:18:34,754 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.663e+01 2.221e+01 2.519e+01 2.793e+01 1.462e+02, threshold=5.038e+01, percent-clipped=2.0 2024-08-20 13:18:43,592 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 32 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 13:19:02,241 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 20 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-20 13:19:28,107 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 15 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-20 13:19:35,639 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.65 vs. limit=15.0 2024-08-20 13:19:41,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4817780.0, ans=0.125 2024-08-20 13:19:42,580 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 7600, loss[loss=0.1019, beats_loss=0.009837, ecapa_loss=0.0001321, whisper_loss=0.09071, over 22716.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01045, ecapa_loss=0.0001411, whisper_loss=0.08917, over 3862832.52 frames. ], batch size: 88, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:19:44,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=4817780.0, ans=22.5 2024-08-20 13:19:58,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4817780.0, ans=0.0 2024-08-20 13:20:00,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4817780.0, ans=0.0 2024-08-20 13:20:16,859 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 13:20:17,449 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=15.0 2024-08-20 13:20:37,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4817980.0, ans=0.0 2024-08-20 13:20:53,217 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 13:20:55,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4818080.0, ans=0.5 2024-08-20 13:21:16,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4818180.0, ans=0.1 2024-08-20 13:21:19,815 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 7650, loss[loss=0.1238, beats_loss=0.007868, ecapa_loss=0.0001429, whisper_loss=0.1145, over 22185.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01037, ecapa_loss=0.0001411, whisper_loss=0.08944, over 3855140.26 frames. ], batch size: 87, lr: 1.85e-03, grad_scale: 1.152921504606847e+18 2024-08-20 13:21:20,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4818280.0, ans=0.1 2024-08-20 13:21:34,975 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 30 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-20 13:21:52,256 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.373e+01 2.628e+01 2.994e+01 5.178e+01, threshold=5.256e+01, percent-clipped=1.0 2024-08-20 13:21:53,930 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.98 vs. limit=15.0 2024-08-20 13:22:13,895 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 20 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-20 13:22:18,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4818580.0, ans=0.09899494936611666 2024-08-20 13:22:24,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4818580.0, ans=0.125 2024-08-20 13:22:51,455 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 23 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-20 13:22:54,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4818680.0, ans=0.2 2024-08-20 13:22:57,176 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 7700, loss[loss=0.1063, beats_loss=0.0087, ecapa_loss=0.0001489, whisper_loss=0.09616, over 24000.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01038, ecapa_loss=0.0001419, whisper_loss=0.08929, over 3842417.97 frames. ], batch size: 93, lr: 1.85e-03, grad_scale: 1.152921504606847e+18 2024-08-20 13:22:57,344 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 18 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-20 13:23:23,873 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-20 13:23:29,665 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-20 13:23:50,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4818980.0, ans=0.125 2024-08-20 13:23:57,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4819080.0, ans=0.1 2024-08-20 13:24:03,697 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 25 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-20 13:24:05,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4819080.0, ans=0.5 2024-08-20 13:24:14,552 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 29 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 13:24:16,297 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 15 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-20 13:24:20,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4819180.0, ans=0.05 2024-08-20 13:24:35,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4819180.0, ans=0.1 2024-08-20 13:24:39,019 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 7750, loss[loss=0.1204, beats_loss=0.008353, ecapa_loss=0.0001464, whisper_loss=0.1106, over 16224.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01032, ecapa_loss=0.0001423, whisper_loss=0.0892, over 3823494.75 frames. ], batch size: 64, lr: 1.85e-03, grad_scale: 1.152921504606847e+18 2024-08-20 13:24:48,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4819280.0, ans=0.125 2024-08-20 13:25:00,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4819380.0, ans=0.125 2024-08-20 13:25:02,558 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.371e-02 2024-08-20 13:25:11,809 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 18 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-20 13:25:15,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4819380.0, ans=0.0 2024-08-20 13:25:16,268 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.228e+01 2.412e+01 2.689e+01 6.555e+01, threshold=4.823e+01, percent-clipped=1.0 2024-08-20 13:25:23,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4819480.0, ans=0.125 2024-08-20 13:25:33,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4819480.0, ans=0.035 2024-08-20 13:25:40,221 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.78 vs. limit=15.0 2024-08-20 13:25:40,991 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-20 13:25:58,265 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 15 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-20 13:26:02,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4819680.0, ans=0.125 2024-08-20 13:26:06,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4819680.0, ans=0.125 2024-08-20 13:26:17,569 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 7800, loss[loss=0.09008, beats_loss=0.01236, ecapa_loss=0.0001455, whisper_loss=0.07626, over 20184.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01037, ecapa_loss=0.0001404, whisper_loss=0.08944, over 3818311.62 frames. ], batch size: 84, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:26:29,872 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 19 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 13:26:56,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4819980.0, ans=0.125 2024-08-20 13:27:09,402 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-20 13:27:46,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4820180.0, ans=0.2 2024-08-20 13:27:52,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4820180.0, ans=0.125 2024-08-20 13:27:58,225 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 7850, loss[loss=0.09502, beats_loss=0.009999, ecapa_loss=0.0001634, whisper_loss=0.08339, over 17691.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01033, ecapa_loss=0.0001408, whisper_loss=0.08935, over 3862035.11 frames. ], batch size: 73, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:28:23,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4820380.0, ans=0.0 2024-08-20 13:28:33,891 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=12.0 2024-08-20 13:28:34,229 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.297e+01 2.493e+01 2.759e+01 4.989e+01, threshold=4.986e+01, percent-clipped=1.0 2024-08-20 13:29:00,448 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 23 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 13:29:05,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4820580.0, ans=0.2 2024-08-20 13:29:07,986 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=15.0 2024-08-20 13:29:22,068 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 13:29:24,192 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 31 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-20 13:29:34,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4820680.0, ans=0.0 2024-08-20 13:29:40,346 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 7900, loss[loss=0.09473, beats_loss=0.01129, ecapa_loss=0.0001305, whisper_loss=0.08213, over 21232.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01032, ecapa_loss=0.0001406, whisper_loss=0.08951, over 3848077.90 frames. ], batch size: 85, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:29:40,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4820780.0, ans=0.1 2024-08-20 13:29:46,340 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 25 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-20 13:29:48,475 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 14 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 13:29:53,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4820780.0, ans=0.125 2024-08-20 13:29:57,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4820780.0, ans=0.0 2024-08-20 13:30:02,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4820880.0, ans=0.125 2024-08-20 13:30:02,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4820880.0, ans=0.125 2024-08-20 13:30:18,533 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 33 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-20 13:30:44,218 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.37 vs. limit=12.0 2024-08-20 13:30:50,075 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 28 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 13:30:53,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4821080.0, ans=0.125 2024-08-20 13:30:54,472 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.37 vs. limit=15.0 2024-08-20 13:30:56,343 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.94 vs. limit=15.0 2024-08-20 13:31:08,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4821180.0, ans=0.0 2024-08-20 13:31:19,935 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 7950, loss[loss=0.0984, beats_loss=0.01037, ecapa_loss=0.0001522, whisper_loss=0.08651, over 19312.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01027, ecapa_loss=0.00014, whisper_loss=0.08996, over 3864056.39 frames. ], batch size: 81, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:31:23,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4821280.0, ans=0.0 2024-08-20 13:31:50,707 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 31 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 13:31:56,048 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.297e+01 2.495e+01 2.761e+01 3.642e+01, threshold=4.990e+01, percent-clipped=0.0 2024-08-20 13:32:29,152 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 19 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 13:32:48,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4821680.0, ans=0.2 2024-08-20 13:32:52,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4821680.0, ans=0.0 2024-08-20 13:32:53,845 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 32 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 13:32:55,135 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 8000, loss[loss=0.1169, beats_loss=0.009984, ecapa_loss=0.0001404, whisper_loss=0.1056, over 22360.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01029, ecapa_loss=0.0001411, whisper_loss=0.08991, over 3827721.38 frames. ], batch size: 87, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:32:55,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4821780.0, ans=0.125 2024-08-20 13:33:07,060 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 23 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-20 13:33:09,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4821780.0, ans=0.2 2024-08-20 13:33:19,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4821880.0, ans=0.125 2024-08-20 13:33:34,876 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.66 vs. limit=22.5 2024-08-20 13:33:41,489 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 30 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-20 13:34:01,350 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 18 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-20 13:34:09,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4822080.0, ans=0.0 2024-08-20 13:34:32,048 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 8050, loss[loss=0.09162, beats_loss=0.01239, ecapa_loss=0.0001353, whisper_loss=0.07788, over 14334.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01043, ecapa_loss=0.0001405, whisper_loss=0.08979, over 3829132.06 frames. ], batch size: 58, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:34:43,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4822280.0, ans=0.125 2024-08-20 13:34:56,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4822380.0, ans=0.0 2024-08-20 13:35:04,064 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 29 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-20 13:35:07,275 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.231e+01 2.467e+01 2.722e+01 2.720e+02, threshold=4.934e+01, percent-clipped=1.0 2024-08-20 13:35:08,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4822380.0, ans=0.2 2024-08-20 13:35:09,878 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 25 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-20 13:35:11,141 WARNING [optim.py:496] (3/4) Scaling gradients by 0.020375000312924385, model_norm_threshold=49.342281341552734 2024-08-20 13:35:11,342 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.714e+05, grad_sumsq=7.714e+05, orig_rms_sq=1.000e+00 2024-08-20 13:35:14,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4822480.0, ans=0.125 2024-08-20 13:35:15,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4822480.0, ans=0.0 2024-08-20 13:35:16,978 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 27 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-20 13:35:43,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4822580.0, ans=0.07 2024-08-20 13:36:10,108 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 8100, loss[loss=0.1015, beats_loss=0.01044, ecapa_loss=0.0001209, whisper_loss=0.08986, over 22336.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01046, ecapa_loss=0.0001396, whisper_loss=0.09026, over 3811046.74 frames. ], batch size: 89, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:36:17,766 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 13:36:20,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4822780.0, ans=0.125 2024-08-20 13:36:21,212 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-20 13:37:02,034 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 13:37:03,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4822980.0, ans=0.2 2024-08-20 13:37:10,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4823080.0, ans=0.125 2024-08-20 13:37:26,539 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 13:37:35,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4823180.0, ans=0.2 2024-08-20 13:37:36,255 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.79 vs. limit=22.5 2024-08-20 13:37:39,045 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 25 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-20 13:37:42,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4823180.0, ans=0.125 2024-08-20 13:37:50,175 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 8150, loss[loss=0.09345, beats_loss=0.009881, ecapa_loss=0.0001453, whisper_loss=0.08212, over 19189.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01047, ecapa_loss=0.0001394, whisper_loss=0.08989, over 3807209.93 frames. ], batch size: 78, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:37:55,938 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 23 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 13:38:18,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4823380.0, ans=0.0 2024-08-20 13:38:23,615 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.223e+01 2.514e+01 2.822e+01 2.422e+03, threshold=5.028e+01, percent-clipped=2.0 2024-08-20 13:38:25,966 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 15 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-20 13:38:38,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4823480.0, ans=0.1 2024-08-20 13:38:54,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4823580.0, ans=0.125 2024-08-20 13:39:09,419 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 19 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-20 13:39:26,040 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 8200, loss[loss=0.112, beats_loss=0.01104, ecapa_loss=0.0001453, whisper_loss=0.09948, over 20200.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01036, ecapa_loss=0.0001388, whisper_loss=0.09033, over 3778887.86 frames. ], batch size: 82, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:39:48,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4823880.0, ans=0.0 2024-08-20 13:40:02,686 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.19 vs. limit=15.0 2024-08-20 13:40:08,567 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2024-08-20 13:40:14,853 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-20 13:40:25,148 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.68 vs. limit=15.0 2024-08-20 13:40:31,756 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.42 vs. limit=12.0 2024-08-20 13:41:05,393 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 8250, loss[loss=0.09653, beats_loss=0.01201, ecapa_loss=0.0001443, whisper_loss=0.08308, over 19030.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01041, ecapa_loss=0.0001396, whisper_loss=0.09012, over 3792779.96 frames. ], batch size: 76, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:41:08,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4824280.0, ans=0.125 2024-08-20 13:41:22,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4824280.0, ans=0.1 2024-08-20 13:41:36,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4824380.0, ans=0.125 2024-08-20 13:41:40,706 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.700e+01 2.279e+01 2.520e+01 2.852e+01 4.224e+01, threshold=5.040e+01, percent-clipped=0.0 2024-08-20 13:42:00,624 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 27 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-20 13:42:06,750 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 26 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-20 13:42:10,747 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 13:42:44,657 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 8300, loss[loss=0.102, beats_loss=0.008961, ecapa_loss=0.0001349, whisper_loss=0.09168, over 23028.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01039, ecapa_loss=0.00014, whisper_loss=0.09006, over 3805250.73 frames. ], batch size: 90, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:42:51,069 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 13:43:26,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4824980.0, ans=0.0 2024-08-20 13:43:29,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4824980.0, ans=0.1 2024-08-20 13:43:29,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4824980.0, ans=0.1 2024-08-20 13:43:30,903 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 23 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-20 13:43:36,035 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.98 vs. limit=12.0 2024-08-20 13:44:00,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4825080.0, ans=0.0 2024-08-20 13:44:22,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4825280.0, ans=0.0 2024-08-20 13:44:23,243 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 8350, loss[loss=0.1174, beats_loss=0.009507, ecapa_loss=0.0001458, whisper_loss=0.1064, over 18835.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01043, ecapa_loss=0.0001394, whisper_loss=0.09011, over 3838268.66 frames. ], batch size: 76, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:44:28,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4825280.0, ans=0.125 2024-08-20 13:44:30,169 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2024-08-20 13:44:34,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4825280.0, ans=0.125 2024-08-20 13:44:42,504 WARNING [optim.py:496] (3/4) Scaling gradients by 0.025893952697515488, model_norm_threshold=50.39912796020508 2024-08-20 13:44:42,673 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.24, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.172e+05, grad_sumsq=8.560e+07, orig_rms_sq=1.071e-02 2024-08-20 13:44:55,742 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 26 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-20 13:44:58,393 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.380e+01 2.657e+01 2.993e+01 1.946e+03, threshold=5.314e+01, percent-clipped=1.0 2024-08-20 13:44:59,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4825380.0, ans=0.0 2024-08-20 13:45:30,823 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.51 vs. limit=15.0 2024-08-20 13:45:51,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4825680.0, ans=0.05 2024-08-20 13:45:57,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4825680.0, ans=0.125 2024-08-20 13:46:02,261 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 8400, loss[loss=0.08129, beats_loss=0.01232, ecapa_loss=0.0001194, whisper_loss=0.06778, over 19541.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01044, ecapa_loss=0.0001397, whisper_loss=0.08964, over 3814890.91 frames. ], batch size: 77, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:46:09,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4825780.0, ans=0.125 2024-08-20 13:46:13,682 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 13:46:32,998 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 14 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-20 13:46:36,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4825880.0, ans=0.5 2024-08-20 13:46:39,195 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 19 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-20 13:46:50,296 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 24 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 13:47:12,308 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-20 13:47:36,031 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 20 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-20 13:47:38,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4826180.0, ans=0.0 2024-08-20 13:47:38,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.09 vs. limit=15.0 2024-08-20 13:47:40,628 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 8450, loss[loss=0.05418, beats_loss=0.01397, ecapa_loss=0.000118, whisper_loss=0.03903, over 12801.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01043, ecapa_loss=0.0001399, whisper_loss=0.08919, over 3807507.17 frames. ], batch size: 52, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:47:41,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4826280.0, ans=0.0 2024-08-20 13:47:48,199 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.46 vs. limit=15.0 2024-08-20 13:47:51,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4826280.0, ans=0.125 2024-08-20 13:48:03,627 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2024-08-20 13:48:15,374 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.337e+01 2.560e+01 2.840e+01 5.771e+01, threshold=5.121e+01, percent-clipped=2.0 2024-08-20 13:48:25,559 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 21 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 13:49:05,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4826680.0, ans=0.0 2024-08-20 13:49:17,571 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 8500, loss[loss=0.04493, beats_loss=0.01264, ecapa_loss=0.0001551, whisper_loss=0.03074, over 12614.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01048, ecapa_loss=0.0001402, whisper_loss=0.08887, over 3824726.52 frames. ], batch size: 53, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:49:37,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4826880.0, ans=0.0 2024-08-20 13:49:55,718 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 13:49:57,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4826980.0, ans=0.5 2024-08-20 13:50:01,994 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 18 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-20 13:50:07,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4826980.0, ans=0.0 2024-08-20 13:50:15,439 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.19 vs. limit=15.0 2024-08-20 13:50:20,658 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 26 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 13:50:27,365 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 23 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-20 13:50:40,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4827180.0, ans=0.125 2024-08-20 13:50:44,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4827180.0, ans=0.125 2024-08-20 13:50:46,713 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 8550, loss[loss=0.1162, beats_loss=0.008752, ecapa_loss=0.0001509, whisper_loss=0.106, over 18779.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01042, ecapa_loss=0.0001399, whisper_loss=0.08993, over 3816487.41 frames. ], batch size: 74, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:51:00,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4827280.0, ans=0.5 2024-08-20 13:51:02,209 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 29 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-20 13:51:09,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4827380.0, ans=0.5 2024-08-20 13:51:20,099 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.327e+01 2.509e+01 2.689e+01 1.250e+02, threshold=5.019e+01, percent-clipped=1.0 2024-08-20 13:51:21,521 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.22 vs. limit=10.0 2024-08-20 13:51:32,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4827480.0, ans=0.0 2024-08-20 13:51:38,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4827480.0, ans=0.125 2024-08-20 13:51:51,820 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.23 vs. limit=8.0 2024-08-20 13:51:53,203 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.66 vs. limit=15.0 2024-08-20 13:52:03,646 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 16 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-20 13:52:09,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4827680.0, ans=0.125 2024-08-20 13:52:18,416 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 8600, loss[loss=0.08432, beats_loss=0.01155, ecapa_loss=0.0001655, whisper_loss=0.07111, over 14931.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01037, ecapa_loss=0.0001402, whisper_loss=0.09008, over 3820674.40 frames. ], batch size: 65, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:52:31,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4827780.0, ans=0.05 2024-08-20 13:52:32,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4827780.0, ans=0.125 2024-08-20 13:52:40,115 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2024-08-20 13:53:18,196 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=23.35 vs. limit=22.5 2024-08-20 13:53:30,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4828180.0, ans=0.125 2024-08-20 13:53:35,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4828180.0, ans=0.125 2024-08-20 13:53:43,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4828180.0, ans=0.2 2024-08-20 13:53:47,620 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 8650, loss[loss=0.1163, beats_loss=0.01131, ecapa_loss=0.000132, whisper_loss=0.1037, over 20645.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01037, ecapa_loss=0.0001404, whisper_loss=0.08999, over 3815775.26 frames. ], batch size: 85, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:53:50,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4828280.0, ans=0.125 2024-08-20 13:54:21,941 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.355e+01 2.590e+01 2.852e+01 2.640e+02, threshold=5.179e+01, percent-clipped=3.0 2024-08-20 13:54:38,359 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 13:54:39,876 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 25 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-20 13:54:45,627 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 29 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 13:55:21,895 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 8700, loss[loss=0.08965, beats_loss=0.01261, ecapa_loss=0.000127, whisper_loss=0.07577, over 13841.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01026, ecapa_loss=0.0001409, whisper_loss=0.09086, over 3829668.00 frames. ], batch size: 57, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:55:24,645 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-08-20 13:55:27,111 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 13:55:37,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4828780.0, ans=0.0 2024-08-20 13:55:38,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4828880.0, ans=0.0 2024-08-20 13:55:52,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4828880.0, ans=0.1 2024-08-20 13:56:13,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4828980.0, ans=0.125 2024-08-20 13:56:24,699 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 14 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-20 13:56:35,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4829180.0, ans=0.125 2024-08-20 13:56:41,047 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-20 13:56:43,584 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 16 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-20 13:56:53,949 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 8750, loss[loss=0.09514, beats_loss=0.01147, ecapa_loss=0.0001065, whisper_loss=0.08261, over 17767.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01028, ecapa_loss=0.00014, whisper_loss=0.09085, over 3808294.77 frames. ], batch size: 68, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:57:06,133 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 29 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-20 13:57:16,849 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-20 13:57:22,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4829380.0, ans=0.1 2024-08-20 13:57:29,057 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.350e+01 2.524e+01 2.778e+01 9.782e+01, threshold=5.048e+01, percent-clipped=1.0 2024-08-20 13:57:46,129 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.47 vs. limit=15.0 2024-08-20 13:57:54,427 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.46 vs. limit=6.0 2024-08-20 13:57:55,489 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 25 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-20 13:58:10,674 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 20 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-20 13:58:11,913 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.66 vs. limit=10.0 2024-08-20 13:58:15,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4829680.0, ans=0.0 2024-08-20 13:58:24,234 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 8800, loss[loss=0.1076, beats_loss=0.01134, ecapa_loss=0.0001425, whisper_loss=0.09486, over 13604.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0103, ecapa_loss=0.00014, whisper_loss=0.09089, over 3816203.42 frames. ], batch size: 55, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:58:43,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4829880.0, ans=0.0 2024-08-20 13:58:45,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4829880.0, ans=0.0 2024-08-20 13:58:47,930 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 28 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-20 13:59:14,628 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 19 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-20 13:59:18,044 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 25 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 13:59:30,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4830080.0, ans=0.0 2024-08-20 13:59:55,621 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 8850, loss[loss=0.09909, beats_loss=0.009667, ecapa_loss=0.0001386, whisper_loss=0.08804, over 18960.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01038, ecapa_loss=0.0001396, whisper_loss=0.08988, over 3812054.79 frames. ], batch size: 77, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:00:06,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4830280.0, ans=0.035 2024-08-20 14:00:13,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=4830380.0, ans=0.2 2024-08-20 14:00:15,360 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 31 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 14:00:30,719 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.235e+01 2.490e+01 2.757e+01 4.655e+01, threshold=4.979e+01, percent-clipped=0.0 2024-08-20 14:00:41,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4830480.0, ans=0.125 2024-08-20 14:01:27,830 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 14:01:29,252 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 8900, loss[loss=0.09241, beats_loss=0.009898, ecapa_loss=0.000102, whisper_loss=0.08149, over 17605.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01036, ecapa_loss=0.0001392, whisper_loss=0.09008, over 3781407.22 frames. ], batch size: 66, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:01:34,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4830780.0, ans=0.125 2024-08-20 14:01:43,602 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.96 vs. limit=15.0 2024-08-20 14:01:57,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4830880.0, ans=0.125 2024-08-20 14:02:09,577 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.20 vs. limit=22.5 2024-08-20 14:02:10,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4830980.0, ans=0.0 2024-08-20 14:02:30,395 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-20 14:02:34,426 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 22 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-20 14:02:36,475 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 27 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-20 14:02:42,365 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.83 vs. limit=15.0 2024-08-20 14:02:59,821 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 8950, loss[loss=0.1084, beats_loss=0.01147, ecapa_loss=0.000114, whisper_loss=0.09583, over 19799.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01037, ecapa_loss=0.0001386, whisper_loss=0.08997, over 3785702.57 frames. ], batch size: 79, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:03:08,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4831280.0, ans=0.1 2024-08-20 14:03:15,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4831280.0, ans=0.125 2024-08-20 14:03:18,139 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 29 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-20 14:03:20,561 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.54 vs. limit=15.0 2024-08-20 14:03:24,647 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 12 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-20 14:03:30,486 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.318e+01 2.492e+01 2.834e+01 3.721e+01, threshold=4.984e+01, percent-clipped=0.0 2024-08-20 14:03:35,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4831480.0, ans=0.125 2024-08-20 14:03:49,248 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.46 vs. limit=22.5 2024-08-20 14:03:52,913 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.09 vs. limit=15.0 2024-08-20 14:04:12,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4831680.0, ans=0.125 2024-08-20 14:04:26,526 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 9000, loss[loss=0.09557, beats_loss=0.01416, ecapa_loss=0.0001055, whisper_loss=0.08036, over 22753.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01041, ecapa_loss=0.0001381, whisper_loss=0.09, over 3810048.10 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:04:26,527 INFO [train_multi_KD3.py:1140] (3/4) Computing validation loss 2024-08-20 14:05:10,580 INFO [train_multi_KD3.py:1150] (3/4) Epoch 33, validation on ASR_libri: loss=0.2544, beats_loss=0, ecapa_loss=0.0005032, whisper_loss=0.2493, over 931116.00 frames. 2024-08-20 14:05:34,745 INFO [train_multi_KD3.py:1150] (3/4) Epoch 33, validation on SV_voxceleb1: loss=0.003984, beats_loss=0, ecapa_loss=0.0003984, whisper_loss=0, over 944235.00 frames. 2024-08-20 14:07:15,755 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.6214, 2.2806, 2.1303, 2.0640], device='cuda:3') 2024-08-20 14:07:37,370 INFO [train_multi_KD3.py:1150] (3/4) Epoch 33, validation on AT_audioset: loss=0.02297, beats_loss=0.02297, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 14:07:37,373 INFO [train_multi_KD3.py:1156] (3/4) Maximum memory allocated so far is 32606MB 2024-08-20 14:07:37,876 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 37 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 14:07:46,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4831780.0, ans=0.0 2024-08-20 14:07:53,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4831880.0, ans=0.125 2024-08-20 14:07:59,120 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.34 vs. limit=10.0 2024-08-20 14:08:00,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4831880.0, ans=0.0 2024-08-20 14:08:19,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4831980.0, ans=0.1 2024-08-20 14:08:19,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4831980.0, ans=0.125 2024-08-20 14:08:25,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4831980.0, ans=0.125 2024-08-20 14:08:33,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4832080.0, ans=0.0 2024-08-20 14:08:39,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4832080.0, ans=0.035 2024-08-20 14:08:42,916 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 29 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 14:08:55,942 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 25 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-20 14:08:58,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4832180.0, ans=0.1 2024-08-20 14:09:00,246 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 9050, loss[loss=0.1226, beats_loss=0.009039, ecapa_loss=0.0001639, whisper_loss=0.112, over 15001.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0104, ecapa_loss=0.0001382, whisper_loss=0.09016, over 3803213.14 frames. ], batch size: 57, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:09:00,671 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 31 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-20 14:09:08,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4832280.0, ans=0.125 2024-08-20 14:09:10,336 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 27 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-20 14:09:29,424 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.212e+01 2.355e+01 2.625e+01 3.620e+01, threshold=4.711e+01, percent-clipped=0.0 2024-08-20 14:09:37,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4832480.0, ans=0.0 2024-08-20 14:10:00,819 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 27 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-20 14:10:01,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4832580.0, ans=0.125 2024-08-20 14:10:03,063 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 14:10:03,679 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.17 vs. limit=10.0 2024-08-20 14:10:24,909 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 9100, loss[loss=0.08195, beats_loss=0.009588, ecapa_loss=0.0001692, whisper_loss=0.07067, over 15590.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.0001378, whisper_loss=0.09032, over 3813588.86 frames. ], batch size: 64, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:10:44,448 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 33 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 14:10:46,229 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 26 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-20 14:10:48,457 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.52 vs. limit=15.0 2024-08-20 14:10:53,903 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.652e+01 2024-08-20 14:11:21,171 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 15 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-20 14:11:36,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4833180.0, ans=0.0 2024-08-20 14:11:37,027 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.20 vs. limit=15.0 2024-08-20 14:11:40,715 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.45 vs. limit=22.5 2024-08-20 14:11:45,307 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.11 vs. limit=15.0 2024-08-20 14:11:52,713 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 9150, loss[loss=0.1312, beats_loss=0.008279, ecapa_loss=0.0001582, whisper_loss=0.1213, over 20228.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01055, ecapa_loss=0.0001373, whisper_loss=0.08962, over 3830521.90 frames. ], batch size: 77, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:12:00,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4833280.0, ans=0.125 2024-08-20 14:12:20,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4833380.0, ans=0.125 2024-08-20 14:12:22,572 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.265e+01 2.494e+01 2.821e+01 1.323e+02, threshold=4.988e+01, percent-clipped=2.0 2024-08-20 14:12:34,860 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 24 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 14:12:42,713 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.72 vs. limit=6.0 2024-08-20 14:12:48,147 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.86 vs. limit=15.0 2024-08-20 14:13:02,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4833680.0, ans=0.1 2024-08-20 14:13:19,905 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 9200, loss[loss=0.118, beats_loss=0.007114, ecapa_loss=0.0001613, whisper_loss=0.1092, over 18872.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01052, ecapa_loss=0.000138, whisper_loss=0.08961, over 3826634.71 frames. ], batch size: 72, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:13:30,962 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 22 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-20 14:13:52,206 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 35 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-20 14:14:06,759 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 21 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-20 14:14:07,925 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.95 vs. limit=6.0 2024-08-20 14:14:12,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4834080.0, ans=0.125 2024-08-20 14:14:13,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4834080.0, ans=0.0 2024-08-20 14:14:18,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4834080.0, ans=0.125 2024-08-20 14:14:34,401 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.48 vs. limit=15.0 2024-08-20 14:14:37,057 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.85 vs. limit=6.0 2024-08-20 14:14:46,041 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 9250, loss[loss=0.08155, beats_loss=0.01439, ecapa_loss=0.0001069, whisper_loss=0.06609, over 22884.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01052, ecapa_loss=0.0001373, whisper_loss=0.08948, over 3796360.47 frames. ], batch size: 95, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:14:50,373 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 26 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 14:14:50,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4834280.0, ans=0.125 2024-08-20 14:14:52,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4834280.0, ans=0.125 2024-08-20 14:14:56,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4834280.0, ans=0.125 2024-08-20 14:15:14,105 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 19 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 14:15:16,772 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.272e+01 2.596e+01 3.076e+01 4.662e+01, threshold=5.191e+01, percent-clipped=0.0 2024-08-20 14:15:25,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4834480.0, ans=0.0 2024-08-20 14:15:44,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4834580.0, ans=0.125 2024-08-20 14:15:49,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4834580.0, ans=0.0 2024-08-20 14:16:13,256 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 9300, loss[loss=0.07748, beats_loss=0.01073, ecapa_loss=0.0001397, whisper_loss=0.06535, over 17774.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01055, ecapa_loss=0.0001392, whisper_loss=0.08861, over 3778626.80 frames. ], batch size: 74, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:16:13,525 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-20 14:16:26,471 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 14 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-20 14:16:31,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4834880.0, ans=0.0 2024-08-20 14:16:35,252 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 25 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-20 14:16:48,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4834980.0, ans=0.125 2024-08-20 14:16:53,631 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 24 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-20 14:17:08,093 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.19 vs. limit=15.0 2024-08-20 14:17:18,659 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 30 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-20 14:17:44,333 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 9350, loss[loss=0.1076, beats_loss=0.008731, ecapa_loss=0.0001652, whisper_loss=0.09726, over 17526.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01055, ecapa_loss=0.0001394, whisper_loss=0.08925, over 3812497.63 frames. ], batch size: 73, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:17:45,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4835280.0, ans=0.0 2024-08-20 14:17:49,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4835280.0, ans=0.1 2024-08-20 14:18:12,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4835380.0, ans=0.2 2024-08-20 14:18:17,254 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.243e+01 2.510e+01 2.725e+01 8.699e+01, threshold=5.020e+01, percent-clipped=1.0 2024-08-20 14:18:25,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4835480.0, ans=0.125 2024-08-20 14:18:27,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4835480.0, ans=0.1 2024-08-20 14:18:34,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4835480.0, ans=0.125 2024-08-20 14:18:49,048 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 35 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 14:18:49,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4835580.0, ans=0.1 2024-08-20 14:18:50,252 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=15.0 2024-08-20 14:18:52,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4835580.0, ans=0.125 2024-08-20 14:18:54,536 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-20 14:19:16,743 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 9400, loss[loss=0.1038, beats_loss=0.01065, ecapa_loss=0.0001243, whisper_loss=0.09187, over 22432.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01055, ecapa_loss=0.0001397, whisper_loss=0.08992, over 3884726.48 frames. ], batch size: 88, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:19:23,665 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 35 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-20 14:19:23,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4835780.0, ans=0.2 2024-08-20 14:19:29,108 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-20 14:19:49,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4835880.0, ans=0.0 2024-08-20 14:20:13,393 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 33 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-20 14:20:17,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=4836080.0, ans=6.0 2024-08-20 14:20:43,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4836180.0, ans=0.1 2024-08-20 14:20:47,418 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 9450, loss[loss=0.1181, beats_loss=0.01058, ecapa_loss=0.0001079, whisper_loss=0.1064, over 24413.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0104, ecapa_loss=0.0001395, whisper_loss=0.09111, over 3888468.17 frames. ], batch size: 93, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:21:20,691 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.301e+01 2.582e+01 2.889e+01 4.439e+01, threshold=5.164e+01, percent-clipped=0.0 2024-08-20 14:21:43,745 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0 2024-08-20 14:22:16,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4836680.0, ans=0.2 2024-08-20 14:22:17,180 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.91 vs. limit=22.5 2024-08-20 14:22:21,459 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 9500, loss[loss=0.1021, beats_loss=0.01223, ecapa_loss=0.0001133, whisper_loss=0.08869, over 16562.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01043, ecapa_loss=0.0001396, whisper_loss=0.09082, over 3920913.67 frames. ], batch size: 64, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:22:22,005 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 11 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 14:22:29,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4836780.0, ans=0.125 2024-08-20 14:22:34,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4836780.0, ans=0.0 2024-08-20 14:22:48,345 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2024-08-20 14:23:08,505 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 18 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-20 14:23:09,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4836980.0, ans=0.0 2024-08-20 14:23:15,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4837080.0, ans=0.0 2024-08-20 14:23:37,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4837180.0, ans=0.0 2024-08-20 14:23:37,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4837180.0, ans=0.04949747468305833 2024-08-20 14:23:39,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4837180.0, ans=0.0 2024-08-20 14:23:49,519 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 9550, loss[loss=0.1067, beats_loss=0.008663, ecapa_loss=0.0001551, whisper_loss=0.09647, over 21017.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01043, ecapa_loss=0.0001401, whisper_loss=0.09055, over 3893229.76 frames. ], batch size: 84, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:23:51,861 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 35 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-20 14:23:53,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4837280.0, ans=0.0 2024-08-20 14:24:00,576 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 14 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 14:24:17,555 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.10 vs. limit=22.5 2024-08-20 14:24:21,271 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.249e+01 2.469e+01 2.805e+01 3.890e+01, threshold=4.937e+01, percent-clipped=0.0 2024-08-20 14:24:22,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4837380.0, ans=0.125 2024-08-20 14:24:25,580 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 26 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-20 14:24:31,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4837480.0, ans=0.125 2024-08-20 14:24:36,092 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 32 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 14:24:36,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4837480.0, ans=0.2 2024-08-20 14:24:51,097 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-20 14:24:58,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4837580.0, ans=0.0 2024-08-20 14:25:13,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4837680.0, ans=0.2 2024-08-20 14:25:18,382 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 30 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-20 14:25:19,327 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 9600, loss[loss=0.1161, beats_loss=0.009781, ecapa_loss=0.0001282, whisper_loss=0.105, over 20886.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01039, ecapa_loss=0.0001401, whisper_loss=0.09016, over 3883393.67 frames. ], batch size: 80, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:25:48,904 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 15 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-20 14:26:20,413 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-20 14:26:30,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4838180.0, ans=0.125 2024-08-20 14:26:31,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4838180.0, ans=0.0 2024-08-20 14:26:34,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4838180.0, ans=0.125 2024-08-20 14:26:48,012 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 9650, loss[loss=0.08745, beats_loss=0.009345, ecapa_loss=0.0001903, whisper_loss=0.0762, over 15913.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01044, ecapa_loss=0.0001401, whisper_loss=0.08871, over 3861691.71 frames. ], batch size: 67, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:26:55,331 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 32 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-20 14:27:15,960 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 25 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 14:27:20,392 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.272e+01 2.458e+01 2.754e+01 3.251e+01, threshold=4.916e+01, percent-clipped=0.0 2024-08-20 14:27:35,522 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 24 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-20 14:28:04,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4838680.0, ans=0.0 2024-08-20 14:28:14,667 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 23 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-20 14:28:14,998 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.633e+01 2024-08-20 14:28:16,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4838680.0, ans=0.1 2024-08-20 14:28:17,420 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=12.0 2024-08-20 14:28:18,001 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 14:28:21,305 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 9700, loss[loss=0.09002, beats_loss=0.009803, ecapa_loss=0.0001217, whisper_loss=0.079, over 13460.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.0104, ecapa_loss=0.0001402, whisper_loss=0.08889, over 3830154.96 frames. ], batch size: 49, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:28:34,149 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 25 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-20 14:29:05,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4838980.0, ans=0.125 2024-08-20 14:29:32,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4839080.0, ans=0.125 2024-08-20 14:29:38,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4839180.0, ans=0.0 2024-08-20 14:29:43,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4839180.0, ans=0.125 2024-08-20 14:29:55,255 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 9750, loss[loss=0.09685, beats_loss=0.009692, ecapa_loss=0.0001827, whisper_loss=0.08533, over 21425.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01042, ecapa_loss=0.0001405, whisper_loss=0.0888, over 3810629.70 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:30:09,475 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 28 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 14:30:15,023 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.96 vs. limit=15.0 2024-08-20 14:30:21,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4839380.0, ans=0.1 2024-08-20 14:30:25,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4839380.0, ans=0.125 2024-08-20 14:30:30,211 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.339e+01 2.577e+01 2.928e+01 5.580e+01, threshold=5.154e+01, percent-clipped=1.0 2024-08-20 14:30:42,015 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-20 14:30:55,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4839580.0, ans=0.1 2024-08-20 14:30:57,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4839580.0, ans=10.0 2024-08-20 14:31:24,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4839680.0, ans=0.035 2024-08-20 14:31:28,884 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 9800, loss[loss=0.09289, beats_loss=0.009868, ecapa_loss=0.0001379, whisper_loss=0.08164, over 15200.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01048, ecapa_loss=0.0001399, whisper_loss=0.08875, over 3808497.63 frames. ], batch size: 58, lr: 1.85e-03, grad_scale: 1.152921504606847e+18 2024-08-20 14:31:46,626 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 17 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-20 14:31:55,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4839880.0, ans=0.0 2024-08-20 14:32:07,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4839980.0, ans=0.2 2024-08-20 14:32:24,780 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 19 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-20 14:32:26,646 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 15 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 14:32:54,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4840180.0, ans=0.125 2024-08-20 14:32:58,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4840180.0, ans=0.125 2024-08-20 14:33:00,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=4840180.0, ans=10.0 2024-08-20 14:33:06,948 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 9850, loss[loss=0.1071, beats_loss=0.01121, ecapa_loss=0.0001456, whisper_loss=0.09444, over 22653.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01048, ecapa_loss=0.0001385, whisper_loss=0.08927, over 3768773.98 frames. ], batch size: 90, lr: 1.85e-03, grad_scale: 1.152921504606847e+18 2024-08-20 14:33:42,858 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.298e+01 2.480e+01 2.698e+01 3.610e+01, threshold=4.961e+01, percent-clipped=0.0 2024-08-20 14:33:45,560 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 25 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-20 14:34:03,125 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 26 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-20 14:34:10,335 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2024-08-20 14:34:20,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4840580.0, ans=0.125 2024-08-20 14:34:35,438 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.035e-03 2024-08-20 14:34:46,790 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 9900, loss[loss=0.103, beats_loss=0.01259, ecapa_loss=0.0001305, whisper_loss=0.0891, over 21544.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01048, ecapa_loss=0.0001393, whisper_loss=0.08928, over 3796654.82 frames. ], batch size: 87, lr: 1.85e-03, grad_scale: 1.152921504606847e+18 2024-08-20 14:34:54,897 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 14 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-20 14:34:59,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4840780.0, ans=0.2 2024-08-20 14:35:08,799 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 14:35:13,710 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 21 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-20 14:35:20,352 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.01 vs. limit=22.5 2024-08-20 14:35:24,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4840980.0, ans=0.0 2024-08-20 14:35:26,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4840980.0, ans=0.0 2024-08-20 14:36:10,578 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 26 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-20 14:36:25,364 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 9950, loss[loss=0.1023, beats_loss=0.009744, ecapa_loss=0.0001576, whisper_loss=0.09096, over 17617.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01045, ecapa_loss=0.000138, whisper_loss=0.08933, over 3762465.55 frames. ], batch size: 73, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:36:26,907 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 14:36:29,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4841280.0, ans=0.125 2024-08-20 14:36:40,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4841280.0, ans=0.5 2024-08-20 14:36:59,340 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.238e+01 2.460e+01 2.685e+01 1.158e+02, threshold=4.920e+01, percent-clipped=1.0 2024-08-20 14:37:00,537 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2024-08-20 14:37:12,820 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2024-08-20 14:37:14,821 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-20 14:37:27,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4841580.0, ans=0.0 2024-08-20 14:37:32,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4841580.0, ans=0.125 2024-08-20 14:37:49,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2024-08-20 14:37:51,718 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 10000, loss[loss=0.1171, beats_loss=0.009623, ecapa_loss=0.000153, whisper_loss=0.106, over 16453.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01052, ecapa_loss=0.0001379, whisper_loss=0.08966, over 3794253.19 frames. ], batch size: 66, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:38:22,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4841880.0, ans=0.0 2024-08-20 14:38:43,888 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.09 vs. limit=15.0 2024-08-20 14:39:25,557 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 27 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-20 14:39:33,013 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 10050, loss[loss=0.0973, beats_loss=0.01101, ecapa_loss=0.0001502, whisper_loss=0.08479, over 21877.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0106, ecapa_loss=0.0001386, whisper_loss=0.08903, over 3800180.13 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:39:39,715 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 20 from LS+wenet, 26 from Vox, 13 fro AS 2024-08-20 14:40:08,622 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 13 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 14:40:18,407 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.420e+01 2.635e+01 2.918e+01 2.672e+02, threshold=5.270e+01, percent-clipped=3.0 2024-08-20 14:40:20,397 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 12 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-20 14:40:24,751 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.82 vs. limit=15.0 2024-08-20 14:40:35,357 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 20 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-20 14:40:38,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4842480.0, ans=0.125 2024-08-20 14:40:58,601 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 16 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 14:41:00,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4842580.0, ans=0.125 2024-08-20 14:41:08,518 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 18 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 14:41:22,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4842680.0, ans=0.125 2024-08-20 14:41:30,176 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.41 vs. limit=15.0 2024-08-20 14:41:33,068 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 10100, loss[loss=0.08908, beats_loss=0.01032, ecapa_loss=0.0001562, whisper_loss=0.07719, over 20168.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01048, ecapa_loss=0.0001399, whisper_loss=0.08976, over 3807758.64 frames. ], batch size: 81, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:41:40,990 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 29 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-20 14:41:46,197 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 14:42:14,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4842880.0, ans=0.125 2024-08-20 14:42:40,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4842980.0, ans=0.1 2024-08-20 14:43:28,199 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 10150, loss[loss=0.1015, beats_loss=0.009481, ecapa_loss=0.0001249, whisper_loss=0.09081, over 16905.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01046, ecapa_loss=0.0001407, whisper_loss=0.08986, over 3802980.52 frames. ], batch size: 68, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:43:40,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4843280.0, ans=0.125 2024-08-20 14:43:41,040 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 20 from LS+wenet, 12 from Vox, 42 fro AS 2024-08-20 14:43:48,762 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-20 14:43:59,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4843380.0, ans=0.0 2024-08-20 14:44:03,278 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.366e+01 2.588e+01 2.870e+01 1.184e+02, threshold=5.177e+01, percent-clipped=1.0 2024-08-20 14:44:05,397 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 29 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-20 14:44:12,875 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 37 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-20 14:44:57,148 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 10200, loss[loss=0.09706, beats_loss=0.01314, ecapa_loss=0.0001011, whisper_loss=0.08291, over 22746.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01051, ecapa_loss=0.000139, whisper_loss=0.09016, over 3815603.92 frames. ], batch size: 87, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:45:12,712 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 21 from LS+wenet, 35 from Vox, 35 fro AS 2024-08-20 14:45:13,996 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 21 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-20 14:45:27,996 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 33 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-20 14:45:30,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4843980.0, ans=0.1 2024-08-20 14:46:15,322 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 41 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-20 14:46:15,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4844180.0, ans=0.0 2024-08-20 14:46:17,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4844180.0, ans=0.125 2024-08-20 14:46:26,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4844280.0, ans=0.1 2024-08-20 14:46:27,324 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 10250, loss[loss=0.1366, beats_loss=0.005912, ecapa_loss=0.0001246, whisper_loss=0.1294, over 16562.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01045, ecapa_loss=0.0001397, whisper_loss=0.09061, over 3818506.94 frames. ], batch size: 59, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:46:34,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4844280.0, ans=0.125 2024-08-20 14:46:47,168 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 27 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 14:46:47,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4844380.0, ans=0.125 2024-08-20 14:46:54,036 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2024-08-20 14:46:57,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4844380.0, ans=0.125 2024-08-20 14:47:01,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4844380.0, ans=0.125 2024-08-20 14:47:03,908 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.666e+01 2.274e+01 2.551e+01 2.893e+01 4.019e+02, threshold=5.101e+01, percent-clipped=3.0 2024-08-20 14:47:06,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4844480.0, ans=0.125 2024-08-20 14:47:19,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4844480.0, ans=0.125 2024-08-20 14:48:02,609 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 10300, loss[loss=0.1176, beats_loss=0.007167, ecapa_loss=0.0002199, whisper_loss=0.1082, over 20359.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01044, ecapa_loss=0.000139, whisper_loss=0.09027, over 3825472.37 frames. ], batch size: 90, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:48:09,174 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 17 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 14:48:27,409 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.64 vs. limit=15.0 2024-08-20 14:48:35,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4844880.0, ans=0.0 2024-08-20 14:48:57,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4844980.0, ans=0.125 2024-08-20 14:49:12,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4845080.0, ans=0.125 2024-08-20 14:49:13,587 WARNING [optim.py:496] (3/4) Scaling gradients by 0.02814776450395584, model_norm_threshold=51.010257720947266 2024-08-20 14:49:13,752 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.27, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.746e+05, grad_sumsq=8.746e+05, orig_rms_sq=1.000e+00 2024-08-20 14:49:17,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4845080.0, ans=0.1 2024-08-20 14:49:28,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=4845180.0, ans=0.2 2024-08-20 14:49:50,237 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 10350, loss[loss=0.09263, beats_loss=0.01068, ecapa_loss=0.0001352, whisper_loss=0.0806, over 20063.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01045, ecapa_loss=0.0001395, whisper_loss=0.09041, over 3829546.55 frames. ], batch size: 78, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:49:51,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4845280.0, ans=0.0 2024-08-20 14:49:51,948 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.07 vs. limit=15.0 2024-08-20 14:49:55,514 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 14:50:02,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4845280.0, ans=10.0 2024-08-20 14:50:18,476 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 23 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-20 14:50:33,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4845380.0, ans=0.1 2024-08-20 14:50:37,384 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.346e+01 2.546e+01 2.818e+01 1.812e+03, threshold=5.092e+01, percent-clipped=2.0 2024-08-20 14:50:44,733 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.14 vs. limit=10.0 2024-08-20 14:50:49,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4845480.0, ans=0.125 2024-08-20 14:51:13,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4845580.0, ans=0.0 2024-08-20 14:51:14,266 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 34 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 14:51:18,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4845580.0, ans=0.0 2024-08-20 14:51:25,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4845580.0, ans=0.125 2024-08-20 14:51:25,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4845580.0, ans=0.125 2024-08-20 14:51:33,357 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=15.0 2024-08-20 14:51:39,715 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-20 14:51:43,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4845680.0, ans=0.0 2024-08-20 14:51:54,352 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 10400, loss[loss=0.1069, beats_loss=0.00992, ecapa_loss=0.0001599, whisper_loss=0.09539, over 21612.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01051, ecapa_loss=0.0001398, whisper_loss=0.09017, over 3857796.82 frames. ], batch size: 89, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 14:51:57,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4845780.0, ans=0.0 2024-08-20 14:52:02,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4845780.0, ans=0.1 2024-08-20 14:52:03,228 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 18 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-20 14:52:10,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4845780.0, ans=0.09899494936611666 2024-08-20 14:53:12,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4846080.0, ans=0.2 2024-08-20 14:53:39,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4846180.0, ans=0.1 2024-08-20 14:53:56,088 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 10450, loss[loss=0.09353, beats_loss=0.01025, ecapa_loss=0.0001365, whisper_loss=0.08192, over 18754.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01043, ecapa_loss=0.0001396, whisper_loss=0.09061, over 3832329.42 frames. ], batch size: 73, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 14:54:02,042 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.783e-03 2024-08-20 14:54:15,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4846280.0, ans=0.015 2024-08-20 14:54:19,550 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 21 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 14:54:42,242 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.294e+01 2.565e+01 2.796e+01 8.122e+01, threshold=5.131e+01, percent-clipped=1.0 2024-08-20 14:54:58,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4846480.0, ans=0.125 2024-08-20 14:55:11,111 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 27 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-20 14:55:15,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4846580.0, ans=0.2 2024-08-20 14:55:18,752 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 14:55:28,881 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 30 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-20 14:55:54,034 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 10500, loss[loss=0.09469, beats_loss=0.01051, ecapa_loss=0.0001512, whisper_loss=0.08266, over 22582.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.0001395, whisper_loss=0.09035, over 3839565.43 frames. ], batch size: 93, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 14:56:11,092 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 34 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-20 14:56:16,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4846880.0, ans=0.125 2024-08-20 14:56:24,201 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2024-08-20 14:56:34,505 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 14:56:50,520 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.45 vs. limit=22.5 2024-08-20 14:57:05,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4847080.0, ans=0.0 2024-08-20 14:57:33,281 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2024-08-20 14:57:50,336 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 10550, loss[loss=0.1071, beats_loss=0.01228, ecapa_loss=0.0001202, whisper_loss=0.09365, over 22174.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01047, ecapa_loss=0.000139, whisper_loss=0.09069, over 3894210.61 frames. ], batch size: 89, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 14:57:51,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4847280.0, ans=0.125 2024-08-20 14:57:54,806 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 25 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 14:58:05,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4847280.0, ans=0.125 2024-08-20 14:58:35,816 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.331e+01 2.577e+01 2.959e+01 5.357e+01, threshold=5.154e+01, percent-clipped=1.0 2024-08-20 14:58:40,315 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 20 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-20 14:58:53,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4847480.0, ans=0.125 2024-08-20 14:58:54,668 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 16 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-20 14:59:45,943 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 10600, loss[loss=0.0988, beats_loss=0.009743, ecapa_loss=0.0001599, whisper_loss=0.08746, over 13614.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.0001392, whisper_loss=0.09034, over 3875335.43 frames. ], batch size: 56, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 14:59:50,567 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2024-08-20 15:00:00,331 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 22 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 15:00:12,962 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.66 vs. limit=22.5 2024-08-20 15:00:27,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4847880.0, ans=0.125 2024-08-20 15:00:30,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4847880.0, ans=0.2 2024-08-20 15:00:37,644 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 13 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-20 15:00:44,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4847980.0, ans=0.2 2024-08-20 15:01:03,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4848080.0, ans=0.1 2024-08-20 15:01:10,845 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2024-08-20 15:01:19,133 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 25 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-20 15:01:34,673 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 24 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 15:01:41,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4848180.0, ans=0.125 2024-08-20 15:01:47,020 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 10650, loss[loss=0.08516, beats_loss=0.01238, ecapa_loss=0.0001356, whisper_loss=0.07142, over 14504.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01048, ecapa_loss=0.0001389, whisper_loss=0.08963, over 3874639.30 frames. ], batch size: 63, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:02:07,975 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 23 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-20 15:02:22,378 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.137e+00 2024-08-20 15:02:22,778 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.22 vs. limit=10.0 2024-08-20 15:02:30,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4848380.0, ans=0.1 2024-08-20 15:02:38,649 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.606e+01 2.292e+01 2.599e+01 2.866e+01 5.790e+01, threshold=5.197e+01, percent-clipped=1.0 2024-08-20 15:02:57,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=15.0 2024-08-20 15:03:51,624 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 37 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 15:03:54,788 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 10700, loss[loss=0.1201, beats_loss=0.008878, ecapa_loss=0.0001483, whisper_loss=0.1097, over 22744.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01046, ecapa_loss=0.000139, whisper_loss=0.08957, over 3861648.49 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:04:07,546 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 31 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-20 15:04:53,749 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 17 from LS+wenet, 9 from Vox, 29 fro AS 2024-08-20 15:05:00,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4848980.0, ans=0.125 2024-08-20 15:05:05,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4848980.0, ans=0.125 2024-08-20 15:05:56,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4849180.0, ans=0.125 2024-08-20 15:06:00,160 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 10750, loss[loss=0.09511, beats_loss=0.009767, ecapa_loss=0.0001628, whisper_loss=0.08372, over 21162.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01049, ecapa_loss=0.0001392, whisper_loss=0.08854, over 3829237.63 frames. ], batch size: 89, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:06:14,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4849280.0, ans=0.1 2024-08-20 15:06:31,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4849380.0, ans=0.0 2024-08-20 15:06:49,430 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.343e+01 2.598e+01 3.019e+01 8.630e+01, threshold=5.195e+01, percent-clipped=2.0 2024-08-20 15:06:49,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4849480.0, ans=0.125 2024-08-20 15:07:22,628 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.36 vs. limit=12.0 2024-08-20 15:07:23,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4849580.0, ans=0.125 2024-08-20 15:07:27,741 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 30 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 15:07:55,348 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-20 15:07:57,576 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 10800, loss[loss=0.09432, beats_loss=0.01098, ecapa_loss=0.0001202, whisper_loss=0.08213, over 17858.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01053, ecapa_loss=0.0001396, whisper_loss=0.08854, over 3841286.09 frames. ], batch size: 69, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:08:09,899 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 32 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 15:08:17,338 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 27 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-20 15:08:37,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4849880.0, ans=0.025 2024-08-20 15:08:56,217 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 28 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-20 15:09:07,909 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 23 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-20 15:09:23,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4850080.0, ans=0.0 2024-08-20 15:09:33,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4850180.0, ans=0.2 2024-08-20 15:09:53,684 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 10850, loss[loss=0.08893, beats_loss=0.01165, ecapa_loss=0.0001138, whisper_loss=0.07614, over 23031.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01047, ecapa_loss=0.0001398, whisper_loss=0.08856, over 3847068.68 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:10:00,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4850280.0, ans=0.125 2024-08-20 15:10:00,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4850280.0, ans=0.0 2024-08-20 15:10:36,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4850380.0, ans=0.125 2024-08-20 15:10:42,647 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.250e+01 2.553e+01 2.783e+01 2.694e+02, threshold=5.105e+01, percent-clipped=1.0 2024-08-20 15:10:46,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4850480.0, ans=0.125 2024-08-20 15:11:02,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4850480.0, ans=0.125 2024-08-20 15:11:26,888 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 22 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-20 15:11:37,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4850680.0, ans=0.125 2024-08-20 15:11:44,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4850680.0, ans=0.2 2024-08-20 15:11:57,394 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 10900, loss[loss=0.1032, beats_loss=0.0102, ecapa_loss=0.0001237, whisper_loss=0.09174, over 22825.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01043, ecapa_loss=0.00014, whisper_loss=0.08895, over 3829072.37 frames. ], batch size: 90, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:12:33,977 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 35 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-20 15:13:11,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4851080.0, ans=0.1 2024-08-20 15:13:26,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=4851080.0, ans=0.1 2024-08-20 15:13:54,339 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 10950, loss[loss=0.09949, beats_loss=0.009291, ecapa_loss=0.0001437, whisper_loss=0.08876, over 17241.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01041, ecapa_loss=0.0001405, whisper_loss=0.08919, over 3816546.38 frames. ], batch size: 67, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:14:36,186 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 21 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-20 15:14:40,423 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.141e+01 2.427e+01 2.900e+01 4.434e+01, threshold=4.855e+01, percent-clipped=0.0 2024-08-20 15:14:55,449 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2024-08-20 15:15:05,106 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-20 15:15:12,011 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2024-08-20 15:15:16,802 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 13 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 15:15:18,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4851580.0, ans=0.2 2024-08-20 15:15:23,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4851580.0, ans=0.1 2024-08-20 15:15:26,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4851580.0, ans=0.125 2024-08-20 15:15:33,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4851680.0, ans=10.0 2024-08-20 15:15:38,703 WARNING [optim.py:496] (3/4) Scaling gradients by 0.06528465449810028, model_norm_threshold=48.54976272583008 2024-08-20 15:15:38,869 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.33, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.799e+05, grad_sumsq=1.799e+05, orig_rms_sq=1.000e+00 2024-08-20 15:15:39,132 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 16 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-20 15:15:45,907 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=22.5 2024-08-20 15:15:48,418 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 22 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-20 15:15:53,483 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 11000, loss[loss=0.08416, beats_loss=0.012, ecapa_loss=0.0001455, whisper_loss=0.07071, over 18274.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01046, ecapa_loss=0.00014, whisper_loss=0.08938, over 3815651.54 frames. ], batch size: 75, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:15:53,738 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 33 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 15:15:57,096 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 19 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-20 15:16:01,791 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 28 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-20 15:16:12,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4851780.0, ans=0.125 2024-08-20 15:16:13,650 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 27 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-20 15:16:16,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4851880.0, ans=0.0 2024-08-20 15:16:32,372 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 29 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-20 15:16:52,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4851980.0, ans=0.0 2024-08-20 15:17:13,767 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 23 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-20 15:17:21,853 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.74 vs. limit=22.5 2024-08-20 15:17:26,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4852180.0, ans=0.0 2024-08-20 15:17:47,113 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 11050, loss[loss=0.1258, beats_loss=0.009763, ecapa_loss=0.0001324, whisper_loss=0.1147, over 22654.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01038, ecapa_loss=0.0001409, whisper_loss=0.09066, over 3837353.44 frames. ], batch size: 90, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:17:53,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4852280.0, ans=0.125 2024-08-20 15:18:29,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4852380.0, ans=0.1 2024-08-20 15:18:29,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4852380.0, ans=0.125 2024-08-20 15:18:35,026 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.285e+01 2.516e+01 2.757e+01 7.437e+02, threshold=5.033e+01, percent-clipped=2.0 2024-08-20 15:18:40,175 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 15:18:46,137 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 33 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-20 15:18:50,506 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 15:19:42,087 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 33 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-20 15:19:46,487 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 11100, loss[loss=0.1139, beats_loss=0.01021, ecapa_loss=0.0001481, whisper_loss=0.1022, over 23050.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01044, ecapa_loss=0.0001406, whisper_loss=0.09088, over 3873277.30 frames. ], batch size: 93, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:20:09,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4852880.0, ans=0.2 2024-08-20 15:20:39,761 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.87 vs. limit=15.0 2024-08-20 15:21:19,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4853180.0, ans=0.5 2024-08-20 15:21:22,073 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2024-08-20 15:21:44,381 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 11150, loss[loss=0.1228, beats_loss=0.00704, ecapa_loss=0.0001709, whisper_loss=0.114, over 17043.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01042, ecapa_loss=0.0001403, whisper_loss=0.09066, over 3857733.54 frames. ], batch size: 69, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:21:51,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4853280.0, ans=0.125 2024-08-20 15:22:20,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4853380.0, ans=0.125 2024-08-20 15:22:25,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4853380.0, ans=0.0 2024-08-20 15:22:27,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4853380.0, ans=0.0 2024-08-20 15:22:30,932 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.306e+01 2.527e+01 2.770e+01 3.887e+01, threshold=5.054e+01, percent-clipped=0.0 2024-08-20 15:22:37,009 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.00 vs. limit=15.0 2024-08-20 15:22:46,927 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.00 vs. limit=15.0 2024-08-20 15:22:59,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=4853580.0, ans=12.0 2024-08-20 15:23:09,393 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.31 vs. limit=15.0 2024-08-20 15:23:11,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4853580.0, ans=0.0 2024-08-20 15:23:29,928 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 15:23:46,555 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 11200, loss[loss=0.1016, beats_loss=0.01226, ecapa_loss=0.000126, whisper_loss=0.08803, over 23114.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01049, ecapa_loss=0.0001396, whisper_loss=0.09021, over 3857199.34 frames. ], batch size: 94, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:23:47,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4853780.0, ans=0.1 2024-08-20 15:24:32,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4853880.0, ans=0.035 2024-08-20 15:25:01,364 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.75 vs. limit=15.0 2024-08-20 15:25:09,889 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 25 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-20 15:25:19,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4854080.0, ans=0.09899494936611666 2024-08-20 15:25:26,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4854080.0, ans=0.125 2024-08-20 15:25:53,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4854180.0, ans=0.0 2024-08-20 15:25:56,978 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 11250, loss[loss=0.08418, beats_loss=0.01214, ecapa_loss=0.0001036, whisper_loss=0.07099, over 19170.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01046, ecapa_loss=0.0001406, whisper_loss=0.09046, over 3869282.15 frames. ], batch size: 76, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:25:57,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4854280.0, ans=0.1 2024-08-20 15:26:04,689 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2024-08-20 15:26:09,561 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.17 vs. limit=12.0 2024-08-20 15:26:40,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4854380.0, ans=0.125 2024-08-20 15:26:44,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4854480.0, ans=0.125 2024-08-20 15:26:45,125 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.343e+01 2.555e+01 2.874e+01 4.205e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-20 15:26:45,382 INFO [train_multi_KD3.py:845] (3/4) A total of 49 cuts. 16 from LS+wenet, 17 from Vox, 16 fro AS 2024-08-20 15:27:10,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4854580.0, ans=0.09899494936611666 2024-08-20 15:27:31,639 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 21 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-20 15:27:54,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4854680.0, ans=0.1 2024-08-20 15:27:58,055 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 11300, loss[loss=0.1089, beats_loss=0.01038, ecapa_loss=0.0001593, whisper_loss=0.09693, over 21910.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01041, ecapa_loss=0.0001404, whisper_loss=0.09012, over 3830042.99 frames. ], batch size: 91, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:28:03,661 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 17 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-20 15:28:08,321 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 26 from LS+wenet, 12 from Vox, 17 fro AS 2024-08-20 15:28:09,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4854780.0, ans=0.0 2024-08-20 15:28:12,248 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.325e-03 2024-08-20 15:28:41,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4854880.0, ans=0.125 2024-08-20 15:28:51,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4854980.0, ans=0.125 2024-08-20 15:29:09,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4855080.0, ans=0.0 2024-08-20 15:29:19,924 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 20 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-20 15:29:49,039 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=15.0 2024-08-20 15:29:52,126 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 19 from LS+wenet, 26 from Vox, 22 fro AS 2024-08-20 15:29:54,629 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 33 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-20 15:29:59,446 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 11350, loss[loss=0.09762, beats_loss=0.01244, ecapa_loss=0.0001357, whisper_loss=0.08382, over 20458.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01037, ecapa_loss=0.0001406, whisper_loss=0.09064, over 3844922.14 frames. ], batch size: 81, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:30:21,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4855280.0, ans=0.04949747468305833 2024-08-20 15:30:23,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4855380.0, ans=0.125 2024-08-20 15:30:25,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=4855380.0, ans=0.05 2024-08-20 15:30:34,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4855380.0, ans=0.125 2024-08-20 15:30:41,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4855380.0, ans=0.125 2024-08-20 15:30:48,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4855480.0, ans=0.2 2024-08-20 15:30:49,352 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.328e+01 2.511e+01 2.770e+01 2.674e+02, threshold=5.022e+01, percent-clipped=3.0 2024-08-20 15:31:23,581 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.61 vs. limit=22.5 2024-08-20 15:31:37,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4855680.0, ans=0.125 2024-08-20 15:31:56,135 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 21 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-20 15:32:03,238 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 11400, loss[loss=0.08898, beats_loss=0.01236, ecapa_loss=0.0001447, whisper_loss=0.07517, over 22071.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01046, ecapa_loss=0.0001404, whisper_loss=0.08956, over 3804829.49 frames. ], batch size: 92, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:32:11,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4855780.0, ans=0.1 2024-08-20 15:32:19,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4855780.0, ans=0.125 2024-08-20 15:32:50,609 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.48 vs. limit=10.0 2024-08-20 15:33:05,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4855980.0, ans=0.125 2024-08-20 15:33:22,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4856080.0, ans=0.1 2024-08-20 15:33:29,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4856080.0, ans=0.125 2024-08-20 15:33:32,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4856080.0, ans=0.1 2024-08-20 15:33:45,146 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.35 vs. limit=22.5 2024-08-20 15:33:45,903 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 16 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 15:33:55,529 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-20 15:33:59,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4856180.0, ans=0.0 2024-08-20 15:34:02,588 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 11450, loss[loss=0.09409, beats_loss=0.009819, ecapa_loss=0.0001272, whisper_loss=0.083, over 17246.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01046, ecapa_loss=0.0001408, whisper_loss=0.08978, over 3818201.91 frames. ], batch size: 67, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:34:27,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4856380.0, ans=0.125 2024-08-20 15:34:30,091 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.04 vs. limit=15.0 2024-08-20 15:34:40,417 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.02 vs. limit=15.0 2024-08-20 15:34:51,080 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 19 from LS+wenet, 16 from Vox, 16 fro AS 2024-08-20 15:34:53,075 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.356e+01 2.634e+01 3.043e+01 3.885e+01, threshold=5.268e+01, percent-clipped=0.0 2024-08-20 15:35:19,022 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.45 vs. limit=15.0 2024-08-20 15:35:19,066 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.87 vs. limit=15.0 2024-08-20 15:35:25,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4856580.0, ans=0.0 2024-08-20 15:35:35,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4856580.0, ans=0.125 2024-08-20 15:35:50,635 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.39 vs. limit=15.0 2024-08-20 15:36:02,522 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 11500, loss[loss=0.1059, beats_loss=0.01184, ecapa_loss=0.0001169, whisper_loss=0.09294, over 23379.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0104, ecapa_loss=0.0001404, whisper_loss=0.09001, over 3825695.27 frames. ], batch size: 92, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:36:19,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4856780.0, ans=0.125 2024-08-20 15:36:48,971 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.31 vs. limit=15.0 2024-08-20 15:36:51,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4856980.0, ans=0.0 2024-08-20 15:36:59,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4856980.0, ans=0.5 2024-08-20 15:37:28,099 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 18 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-20 15:37:29,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4857180.0, ans=0.125 2024-08-20 15:37:45,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4857180.0, ans=0.2 2024-08-20 15:37:45,220 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.689e+01 2024-08-20 15:37:53,876 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 11550, loss[loss=0.1165, beats_loss=0.01027, ecapa_loss=0.0001391, whisper_loss=0.1048, over 19832.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01033, ecapa_loss=0.0001404, whisper_loss=0.08999, over 3794602.74 frames. ], batch size: 78, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:37:57,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4857280.0, ans=0.125 2024-08-20 15:38:11,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4857280.0, ans=0.0 2024-08-20 15:38:14,079 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 20 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 15:38:19,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4857380.0, ans=0.0 2024-08-20 15:38:25,570 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 29 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 15:38:29,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4857380.0, ans=0.125 2024-08-20 15:38:40,529 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.202e+01 2.508e+01 2.840e+01 4.143e+01, threshold=5.016e+01, percent-clipped=0.0 2024-08-20 15:39:13,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4857580.0, ans=0.2 2024-08-20 15:39:19,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4857580.0, ans=0.125 2024-08-20 15:39:21,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4857580.0, ans=0.0 2024-08-20 15:39:36,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4857680.0, ans=0.125 2024-08-20 15:39:47,011 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 11600, loss[loss=0.1034, beats_loss=0.01209, ecapa_loss=0.0001217, whisper_loss=0.09009, over 18634.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01042, ecapa_loss=0.0001399, whisper_loss=0.08969, over 3836014.40 frames. ], batch size: 71, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:40:01,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4857780.0, ans=0.0 2024-08-20 15:40:04,494 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 23 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-20 15:40:09,741 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.833e+01 2024-08-20 15:40:12,618 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 21 from LS+wenet, 34 from Vox, 36 fro AS 2024-08-20 15:40:36,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4857980.0, ans=0.125 2024-08-20 15:40:46,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4857980.0, ans=0.0 2024-08-20 15:40:58,498 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=8.095e-01 2024-08-20 15:40:58,508 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.638e-01 2024-08-20 15:41:11,324 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 27 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-20 15:41:36,341 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 11650, loss[loss=0.1053, beats_loss=0.01005, ecapa_loss=0.0001259, whisper_loss=0.09398, over 20478.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01037, ecapa_loss=0.0001407, whisper_loss=0.08949, over 3838883.00 frames. ], batch size: 79, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:41:41,250 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 30 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-20 15:41:56,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4858280.0, ans=0.125 2024-08-20 15:41:58,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4858380.0, ans=0.04949747468305833 2024-08-20 15:42:24,940 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.221e+01 2.529e+01 2.912e+01 8.219e+01, threshold=5.058e+01, percent-clipped=1.0 2024-08-20 15:42:43,503 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.97 vs. limit=15.0 2024-08-20 15:42:49,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4858580.0, ans=0.0 2024-08-20 15:43:17,167 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=15.0 2024-08-20 15:43:34,405 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 11700, loss[loss=0.118, beats_loss=0.01036, ecapa_loss=0.0001412, whisper_loss=0.1062, over 21727.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01037, ecapa_loss=0.0001402, whisper_loss=0.08995, over 3841496.52 frames. ], batch size: 85, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:43:38,823 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-20 15:43:45,318 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.92 vs. limit=22.5 2024-08-20 15:44:01,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4858880.0, ans=0.035 2024-08-20 15:44:52,664 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=11.97 vs. limit=12.0 2024-08-20 15:45:00,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4859080.0, ans=0.0 2024-08-20 15:45:27,532 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 11750, loss[loss=0.1039, beats_loss=0.01008, ecapa_loss=0.0001449, whisper_loss=0.09239, over 21706.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01037, ecapa_loss=0.0001393, whisper_loss=0.09021, over 3821480.59 frames. ], batch size: 87, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:45:33,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4859280.0, ans=0.125 2024-08-20 15:45:54,011 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 17 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-20 15:45:58,131 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 23 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-20 15:46:10,689 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.330e+01 2.512e+01 2.808e+01 3.989e+01, threshold=5.024e+01, percent-clipped=0.0 2024-08-20 15:46:27,272 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 20 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-20 15:46:35,288 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 35 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-20 15:46:50,061 WARNING [optim.py:496] (3/4) Scaling gradients by 0.03734064847230911, model_norm_threshold=50.2408561706543 2024-08-20 15:46:50,227 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.797e+05, grad_sumsq=2.797e+05, orig_rms_sq=1.000e+00 2024-08-20 15:46:50,496 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 34 from LS+wenet, 31 from Vox, 27 fro AS 2024-08-20 15:46:58,141 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 16 from LS+wenet, 24 from Vox, 11 fro AS 2024-08-20 15:47:14,505 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.54 vs. limit=22.5 2024-08-20 15:47:14,977 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 11800, loss[loss=0.09517, beats_loss=0.01052, ecapa_loss=0.000113, whisper_loss=0.08352, over 15669.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01029, ecapa_loss=0.000139, whisper_loss=0.09058, over 3800622.25 frames. ], batch size: 60, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:47:24,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4859780.0, ans=0.125 2024-08-20 15:47:43,223 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-20 15:48:00,502 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-08-20 15:48:04,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4859980.0, ans=0.2 2024-08-20 15:48:06,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4859980.0, ans=0.0 2024-08-20 15:48:12,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4859980.0, ans=0.1 2024-08-20 15:48:15,611 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.38 vs. limit=6.0 2024-08-20 15:48:18,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4860080.0, ans=0.2 2024-08-20 15:48:58,014 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 11850, loss[loss=0.1092, beats_loss=0.009885, ecapa_loss=0.0001603, whisper_loss=0.0977, over 18692.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01034, ecapa_loss=0.0001394, whisper_loss=0.09046, over 3820462.52 frames. ], batch size: 72, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:49:01,006 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2024-08-20 15:49:34,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4860380.0, ans=0.1 2024-08-20 15:49:36,767 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.601e+01 2.263e+01 2.480e+01 2.849e+01 1.345e+03, threshold=4.961e+01, percent-clipped=1.0 2024-08-20 15:49:41,254 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 17 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-20 15:49:43,304 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 12 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-20 15:49:53,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4860480.0, ans=0.125 2024-08-20 15:50:16,111 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 27 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-20 15:50:26,048 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2024-08-20 15:50:26,821 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 15 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-20 15:50:38,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4860780.0, ans=0.1 2024-08-20 15:50:38,963 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 11900, loss[loss=0.1095, beats_loss=0.01125, ecapa_loss=0.0001304, whisper_loss=0.09691, over 23551.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01042, ecapa_loss=0.0001387, whisper_loss=0.09055, over 3842017.12 frames. ], batch size: 93, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:50:52,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4860780.0, ans=0.0 2024-08-20 15:50:56,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4860780.0, ans=0.1 2024-08-20 15:51:56,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4861080.0, ans=0.125 2024-08-20 15:52:04,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4861180.0, ans=0.0 2024-08-20 15:52:18,439 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 16 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 15:52:22,949 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 11950, loss[loss=0.1085, beats_loss=0.01102, ecapa_loss=0.000168, whisper_loss=0.09581, over 21255.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01046, ecapa_loss=0.0001403, whisper_loss=0.08982, over 3814808.25 frames. ], batch size: 89, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:52:24,863 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2024-08-20 15:52:48,349 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.50 vs. limit=22.5 2024-08-20 15:52:55,215 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 23 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 15:53:06,034 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.535e+01 2.342e+01 2.523e+01 2.820e+01 2.544e+02, threshold=5.046e+01, percent-clipped=1.0 2024-08-20 15:53:10,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4861480.0, ans=0.125 2024-08-20 15:53:16,874 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 14 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-20 15:53:18,721 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 30 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-20 15:53:31,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4861580.0, ans=0.0 2024-08-20 15:53:35,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4861580.0, ans=0.2 2024-08-20 15:53:57,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4861680.0, ans=0.2 2024-08-20 15:54:01,413 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 18 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 15:54:07,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4861680.0, ans=0.1 2024-08-20 15:54:12,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4861780.0, ans=0.0 2024-08-20 15:54:12,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4861780.0, ans=0.125 2024-08-20 15:54:13,283 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 12000, loss[loss=0.1152, beats_loss=0.008744, ecapa_loss=0.0001545, whisper_loss=0.1049, over 22257.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.0001397, whisper_loss=0.09035, over 3849946.74 frames. ], batch size: 90, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:54:13,283 INFO [train_multi_KD3.py:1140] (3/4) Computing validation loss 2024-08-20 15:54:33,567 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.6234, 3.4454, 3.0465, 2.9512], device='cuda:3') 2024-08-20 15:54:48,601 INFO [train_multi_KD3.py:1150] (3/4) Epoch 33, validation on ASR_libri: loss=0.2555, beats_loss=0, ecapa_loss=0.000501, whisper_loss=0.2505, over 931116.00 frames. 2024-08-20 15:55:14,136 INFO [train_multi_KD3.py:1150] (3/4) Epoch 33, validation on SV_voxceleb1: loss=0.003892, beats_loss=0, ecapa_loss=0.0003892, whisper_loss=0, over 944235.00 frames. 2024-08-20 15:56:55,561 INFO [train_multi_KD3.py:1150] (3/4) Epoch 33, validation on AT_audioset: loss=0.02299, beats_loss=0.02299, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 15:56:55,570 INFO [train_multi_KD3.py:1156] (3/4) Maximum memory allocated so far is 32606MB 2024-08-20 15:57:19,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4861880.0, ans=0.2 2024-08-20 15:57:26,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4861880.0, ans=0.07 2024-08-20 15:57:30,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4861980.0, ans=0.1 2024-08-20 15:57:32,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4861980.0, ans=0.2 2024-08-20 15:57:33,465 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 23 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-20 15:57:47,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=4862080.0, ans=0.05 2024-08-20 15:58:08,545 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.60 vs. limit=15.0 2024-08-20 15:58:19,398 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 12050, loss[loss=0.1238, beats_loss=0.009256, ecapa_loss=0.0001311, whisper_loss=0.1132, over 22718.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01048, ecapa_loss=0.0001393, whisper_loss=0.09022, over 3833940.37 frames. ], batch size: 88, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:58:28,323 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 22 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-20 15:58:30,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4862280.0, ans=0.125 2024-08-20 15:58:33,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4862280.0, ans=0.5 2024-08-20 15:58:33,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4862280.0, ans=0.125 2024-08-20 15:58:53,548 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.378e+01 2.665e+01 2.948e+01 5.073e+01, threshold=5.329e+01, percent-clipped=1.0 2024-08-20 15:59:26,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4862680.0, ans=0.2 2024-08-20 15:59:44,454 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 12100, loss[loss=0.09079, beats_loss=0.01063, ecapa_loss=0.0001327, whisper_loss=0.07884, over 16979.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01044, ecapa_loss=0.0001401, whisper_loss=0.0898, over 3829931.02 frames. ], batch size: 66, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:59:44,702 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 22 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 15:59:52,737 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-20 16:00:26,339 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.03 vs. limit=15.0 2024-08-20 16:00:34,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4863080.0, ans=0.2 2024-08-20 16:00:35,030 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 34 from LS+wenet, 33 from Vox, 23 fro AS 2024-08-20 16:00:42,953 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2024-08-20 16:00:53,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4863180.0, ans=0.1 2024-08-20 16:01:07,086 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 12150, loss[loss=0.08197, beats_loss=0.01404, ecapa_loss=0.0001185, whisper_loss=0.06675, over 20562.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01048, ecapa_loss=0.0001406, whisper_loss=0.08962, over 3872447.88 frames. ], batch size: 85, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 16:01:16,521 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.18 vs. limit=22.5 2024-08-20 16:01:20,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4863280.0, ans=0.125 2024-08-20 16:01:32,391 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.42 vs. limit=15.0 2024-08-20 16:01:39,694 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.292e+01 2.549e+01 2.868e+01 6.331e+01, threshold=5.097e+01, percent-clipped=1.0 2024-08-20 16:01:47,618 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.25 vs. limit=15.0 2024-08-20 16:02:11,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4863680.0, ans=0.2 2024-08-20 16:02:12,613 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 31 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-20 16:02:25,444 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 20 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-20 16:02:28,287 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 12200, loss[loss=0.07624, beats_loss=0.0126, ecapa_loss=0.0001492, whisper_loss=0.06215, over 21181.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01043, ecapa_loss=0.0001415, whisper_loss=0.08943, over 3812145.40 frames. ], batch size: 89, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 16:02:37,598 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.15 vs. limit=15.0 2024-08-20 16:03:18,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4864080.0, ans=0.2 2024-08-20 16:03:33,902 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 12 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 16:03:34,447 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2024-08-20 16:03:38,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4864180.0, ans=0.125 2024-08-20 16:03:48,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4864280.0, ans=0.125 2024-08-20 16:03:49,862 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 12250, loss[loss=0.1271, beats_loss=0.008397, ecapa_loss=0.0001171, whisper_loss=0.1175, over 17141.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01045, ecapa_loss=0.0001413, whisper_loss=0.08897, over 3815042.23 frames. ], batch size: 64, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 16:03:55,557 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.93 vs. limit=15.0 2024-08-20 16:04:19,604 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-20 16:04:21,828 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.269e+01 2.404e+01 2.750e+01 9.360e+01, threshold=4.808e+01, percent-clipped=1.0 2024-08-20 16:04:29,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4864480.0, ans=0.125 2024-08-20 16:04:44,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4864580.0, ans=0.1 2024-08-20 16:04:52,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4864580.0, ans=0.1 2024-08-20 16:05:02,217 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 13 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-20 16:05:03,139 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.23 vs. limit=22.5 2024-08-20 16:05:11,857 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 12300, loss[loss=0.08185, beats_loss=0.01075, ecapa_loss=0.0001495, whisper_loss=0.06961, over 22702.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01048, ecapa_loss=0.0001411, whisper_loss=0.08859, over 3810664.39 frames. ], batch size: 96, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 16:05:33,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4864880.0, ans=10.0 2024-08-20 16:05:36,380 WARNING [optim.py:496] (3/4) Scaling gradients by 0.05332305282354355, model_norm_threshold=48.08091354370117 2024-08-20 16:05:36,547 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.333e+05, grad_sumsq=1.333e+05, orig_rms_sq=1.000e+00 2024-08-20 16:05:52,090 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2024-08-20 16:05:58,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4864980.0, ans=0.0 2024-08-20 16:06:28,186 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 22 from LS+wenet, 13 from Vox, 17 fro AS 2024-08-20 16:06:33,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4865280.0, ans=0.0 2024-08-20 16:06:34,546 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 12350, loss[loss=0.1192, beats_loss=0.007882, ecapa_loss=0.0001402, whisper_loss=0.11, over 23776.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01044, ecapa_loss=0.0001409, whisper_loss=0.08911, over 3811730.20 frames. ], batch size: 88, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 16:07:08,943 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.318e+01 2.528e+01 2.855e+01 9.017e+02, threshold=5.055e+01, percent-clipped=1.0 2024-08-20 16:07:11,759 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.57 vs. limit=10.0 2024-08-20 16:07:16,676 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.91 vs. limit=15.0 2024-08-20 16:07:27,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4865580.0, ans=0.125 2024-08-20 16:07:29,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4865580.0, ans=0.0 2024-08-20 16:07:32,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4865580.0, ans=0.2 2024-08-20 16:07:42,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4865680.0, ans=0.1 2024-08-20 16:07:47,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4865680.0, ans=0.0 2024-08-20 16:07:54,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4865680.0, ans=0.125 2024-08-20 16:08:00,599 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 12400, loss[loss=0.07495, beats_loss=0.01402, ecapa_loss=9.894e-05, whisper_loss=0.05994, over 22441.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01042, ecapa_loss=0.0001393, whisper_loss=0.08917, over 3817224.60 frames. ], batch size: 89, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:08:01,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4865780.0, ans=10.0 2024-08-20 16:08:18,602 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.42 vs. limit=12.0 2024-08-20 16:08:48,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4865980.0, ans=0.125 2024-08-20 16:08:52,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4865980.0, ans=0.125 2024-08-20 16:08:56,965 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.66 vs. limit=22.5 2024-08-20 16:09:10,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4866080.0, ans=0.2 2024-08-20 16:09:36,252 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 20 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 16:09:39,951 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 12450, loss[loss=0.0959, beats_loss=0.009198, ecapa_loss=0.0001191, whisper_loss=0.08551, over 16408.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01045, ecapa_loss=0.0001388, whisper_loss=0.08888, over 3798532.98 frames. ], batch size: 60, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:10:22,391 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.270e+01 2.513e+01 2.843e+01 4.408e+01, threshold=5.027e+01, percent-clipped=0.0 2024-08-20 16:10:33,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4866480.0, ans=0.2 2024-08-20 16:10:54,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4866580.0, ans=0.125 2024-08-20 16:11:10,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4866680.0, ans=0.2 2024-08-20 16:11:13,832 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 18 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-20 16:11:17,522 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.36 vs. limit=10.0 2024-08-20 16:11:23,880 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 12500, loss[loss=0.1166, beats_loss=0.0102, ecapa_loss=0.0001354, whisper_loss=0.1051, over 21191.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01046, ecapa_loss=0.0001384, whisper_loss=0.08911, over 3795487.13 frames. ], batch size: 82, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:11:33,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4866780.0, ans=0.125 2024-08-20 16:11:33,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4866780.0, ans=0.2 2024-08-20 16:11:41,042 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-20 16:11:55,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4866880.0, ans=0.2 2024-08-20 16:12:02,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4866880.0, ans=0.1 2024-08-20 16:12:06,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4866980.0, ans=0.125 2024-08-20 16:12:40,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4867080.0, ans=0.125 2024-08-20 16:12:47,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4867080.0, ans=0.125 2024-08-20 16:13:04,573 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 20 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-20 16:13:15,593 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 12550, loss[loss=0.118, beats_loss=0.008179, ecapa_loss=0.0001575, whisper_loss=0.1082, over 23474.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01039, ecapa_loss=0.0001395, whisper_loss=0.08998, over 3788737.80 frames. ], batch size: 93, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:13:21,004 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-20 16:13:25,384 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 16 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-20 16:13:26,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4867280.0, ans=0.1 2024-08-20 16:14:02,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4867480.0, ans=0.0 2024-08-20 16:14:02,669 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.459e+01 2.718e+01 3.101e+01 5.496e+01, threshold=5.435e+01, percent-clipped=1.0 2024-08-20 16:14:10,456 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 18 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 16:14:26,843 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 32 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 16:15:13,167 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 12600, loss[loss=0.1253, beats_loss=0.009376, ecapa_loss=0.0001426, whisper_loss=0.1145, over 23258.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01045, ecapa_loss=0.0001396, whisper_loss=0.08948, over 3809903.44 frames. ], batch size: 91, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:15:15,722 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 21 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-20 16:16:11,410 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.99 vs. limit=15.0 2024-08-20 16:17:00,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4868180.0, ans=0.1 2024-08-20 16:17:05,907 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 12650, loss[loss=0.1005, beats_loss=0.01122, ecapa_loss=0.000129, whisper_loss=0.08798, over 17655.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01042, ecapa_loss=0.0001383, whisper_loss=0.08965, over 3774448.97 frames. ], batch size: 70, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:17:06,153 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 23 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-20 16:17:21,926 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 25 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-20 16:17:28,722 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 30 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-20 16:17:43,963 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.21 vs. limit=15.0 2024-08-20 16:17:46,082 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-20 16:17:51,142 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.312e+01 2.541e+01 2.719e+01 3.789e+01, threshold=5.083e+01, percent-clipped=0.0 2024-08-20 16:18:02,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4868480.0, ans=0.1 2024-08-20 16:18:33,314 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 30 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-20 16:18:51,247 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.82 vs. limit=15.0 2024-08-20 16:18:58,743 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 12700, loss[loss=0.09311, beats_loss=0.01078, ecapa_loss=0.0001318, whisper_loss=0.08102, over 13878.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0104, ecapa_loss=0.0001382, whisper_loss=0.08932, over 3778010.53 frames. ], batch size: 52, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:19:06,629 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 14 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-20 16:19:18,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=4868780.0, ans=10.0 2024-08-20 16:19:24,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4868880.0, ans=0.0 2024-08-20 16:19:24,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4868880.0, ans=0.1 2024-08-20 16:19:34,033 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 14 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-20 16:20:03,153 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.41 vs. limit=15.0 2024-08-20 16:20:05,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4869080.0, ans=0.0 2024-08-20 16:20:20,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4869080.0, ans=0.0 2024-08-20 16:20:22,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4869080.0, ans=0.125 2024-08-20 16:20:50,493 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 16:20:51,325 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 12750, loss[loss=0.1052, beats_loss=0.009548, ecapa_loss=0.0001377, whisper_loss=0.09429, over 18169.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01041, ecapa_loss=0.0001377, whisper_loss=0.08969, over 3809878.93 frames. ], batch size: 72, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:20:52,895 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.80 vs. limit=12.0 2024-08-20 16:20:56,243 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 15 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-20 16:21:02,444 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 25 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-20 16:21:03,780 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.020e+00 2024-08-20 16:21:20,072 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.68 vs. limit=15.0 2024-08-20 16:21:34,574 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.42 vs. limit=12.0 2024-08-20 16:21:35,114 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.357e+01 2.635e+01 3.039e+01 5.268e+01, threshold=5.270e+01, percent-clipped=2.0 2024-08-20 16:21:55,638 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 32 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-20 16:22:06,471 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.99 vs. limit=6.0 2024-08-20 16:22:22,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4869680.0, ans=0.1 2024-08-20 16:22:29,313 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 34 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-20 16:22:35,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4869780.0, ans=0.125 2024-08-20 16:22:36,638 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 12800, loss[loss=0.09463, beats_loss=0.01076, ecapa_loss=0.0001421, whisper_loss=0.08245, over 19430.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01038, ecapa_loss=0.000138, whisper_loss=0.09046, over 3868230.61 frames. ], batch size: 80, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:23:03,550 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=15.0 2024-08-20 16:23:11,049 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 31 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-20 16:23:34,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4869980.0, ans=0.125 2024-08-20 16:23:42,363 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 16:23:44,863 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-20 16:24:26,342 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 12850, loss[loss=0.09282, beats_loss=0.01059, ecapa_loss=0.000143, whisper_loss=0.0808, over 14793.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01047, ecapa_loss=0.0001396, whisper_loss=0.08982, over 3843213.03 frames. ], batch size: 59, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:24:50,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4870380.0, ans=0.0 2024-08-20 16:24:58,173 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 26 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-20 16:25:07,293 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 17 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-20 16:25:11,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.52 vs. limit=6.0 2024-08-20 16:25:11,325 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.389e+01 2.612e+01 2.924e+01 4.831e+01, threshold=5.224e+01, percent-clipped=0.0 2024-08-20 16:25:11,572 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 10 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-20 16:25:23,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4870480.0, ans=0.1 2024-08-20 16:25:24,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4870480.0, ans=0.0 2024-08-20 16:25:28,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4870480.0, ans=0.125 2024-08-20 16:25:29,825 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 11 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-20 16:25:41,100 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.55 vs. limit=12.0 2024-08-20 16:25:44,249 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 16:25:56,646 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 20 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-20 16:26:12,663 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 12900, loss[loss=0.06455, beats_loss=0.01227, ecapa_loss=0.0001142, whisper_loss=0.05114, over 13112.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01051, ecapa_loss=0.0001397, whisper_loss=0.08919, over 3822828.98 frames. ], batch size: 52, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:26:32,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4870880.0, ans=0.2 2024-08-20 16:26:35,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4870880.0, ans=0.0 2024-08-20 16:26:51,712 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 14 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-20 16:27:00,143 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 25 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 16:27:18,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4871080.0, ans=0.2 2024-08-20 16:27:51,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4871180.0, ans=0.125 2024-08-20 16:27:58,872 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 12950, loss[loss=0.08389, beats_loss=0.01053, ecapa_loss=0.0001284, whisper_loss=0.07207, over 16726.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01057, ecapa_loss=0.0001389, whisper_loss=0.08945, over 3831003.97 frames. ], batch size: 64, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:28:40,004 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.303e+01 2.529e+01 2.820e+01 1.360e+02, threshold=5.058e+01, percent-clipped=1.0 2024-08-20 16:28:57,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4871480.0, ans=0.0 2024-08-20 16:28:57,386 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.35 vs. limit=22.5 2024-08-20 16:29:13,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4871580.0, ans=0.125 2024-08-20 16:29:13,517 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.89 vs. limit=15.0 2024-08-20 16:29:20,084 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.048e-02 2024-08-20 16:29:31,983 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 16:29:40,275 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.60 vs. limit=15.0 2024-08-20 16:29:40,539 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08873618394136429, model_norm_threshold=50.58396911621094 2024-08-20 16:29:40,703 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.0.self_attn_weights.in_proj.bias with proportion 0.22, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.213e+04, grad_sumsq=8.006e+03, orig_rms_sq=9.010e+00 2024-08-20 16:29:47,546 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 13000, loss[loss=0.09159, beats_loss=0.01192, ecapa_loss=0.0001325, whisper_loss=0.07835, over 22354.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01058, ecapa_loss=0.0001392, whisper_loss=0.08939, over 3805356.68 frames. ], batch size: 94, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:30:14,227 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 24 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-20 16:30:22,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4871880.0, ans=0.0 2024-08-20 16:30:33,834 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08446179330348969, model_norm_threshold=50.58396911621094 2024-08-20 16:30:33,999 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.496e+04, grad_sumsq=4.191e+06, orig_rms_sq=1.073e-02 2024-08-20 16:30:38,026 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.44 vs. limit=15.0 2024-08-20 16:30:46,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4871980.0, ans=0.125 2024-08-20 16:30:58,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4872080.0, ans=0.0 2024-08-20 16:31:30,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4872180.0, ans=0.125 2024-08-20 16:31:30,751 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.54 vs. limit=15.0 2024-08-20 16:31:39,784 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 13050, loss[loss=0.07678, beats_loss=0.01314, ecapa_loss=0.0001501, whisper_loss=0.06214, over 20739.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01054, ecapa_loss=0.0001405, whisper_loss=0.08998, over 3817121.07 frames. ], batch size: 89, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:31:40,033 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 19 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 16:31:59,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4872280.0, ans=0.0 2024-08-20 16:32:10,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4872380.0, ans=0.0 2024-08-20 16:32:21,354 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.399e+01 2.559e+01 2.848e+01 5.989e+02, threshold=5.117e+01, percent-clipped=3.0 2024-08-20 16:32:29,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4872480.0, ans=0.125 2024-08-20 16:32:30,100 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-20 16:32:33,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4872480.0, ans=0.125 2024-08-20 16:33:08,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4872680.0, ans=0.0 2024-08-20 16:33:11,386 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 16 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-20 16:33:27,452 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 13100, loss[loss=0.1243, beats_loss=0.007857, ecapa_loss=0.0001704, whisper_loss=0.1147, over 23014.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01058, ecapa_loss=0.0001405, whisper_loss=0.0892, over 3784165.42 frames. ], batch size: 93, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:33:28,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4872780.0, ans=0.2 2024-08-20 16:33:46,705 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.02 vs. limit=12.0 2024-08-20 16:33:56,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4872880.0, ans=0.0 2024-08-20 16:33:59,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4872880.0, ans=0.125 2024-08-20 16:34:09,136 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 19 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 16:34:25,809 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 30 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-20 16:34:33,206 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 20 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-20 16:34:45,762 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.23 vs. limit=15.0 2024-08-20 16:35:23,945 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 13150, loss[loss=0.09139, beats_loss=0.01074, ecapa_loss=0.0001612, whisper_loss=0.07904, over 13320.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0106, ecapa_loss=0.0001408, whisper_loss=0.08916, over 3786536.51 frames. ], batch size: 56, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:35:29,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4873280.0, ans=0.0 2024-08-20 16:35:39,743 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 20 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 16:35:47,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4873380.0, ans=0.125 2024-08-20 16:35:50,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4873380.0, ans=0.125 2024-08-20 16:35:53,694 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 30 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-20 16:36:00,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4873380.0, ans=0.125 2024-08-20 16:36:10,111 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.306e+01 2.480e+01 2.703e+01 4.896e+01, threshold=4.961e+01, percent-clipped=0.0 2024-08-20 16:36:11,861 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 25 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 16:36:12,969 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.54 vs. limit=12.0 2024-08-20 16:36:20,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=4873480.0, ans=12.0 2024-08-20 16:36:38,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4873580.0, ans=0.125 2024-08-20 16:36:41,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4873580.0, ans=0.1 2024-08-20 16:37:15,789 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 13200, loss[loss=0.08085, beats_loss=0.01073, ecapa_loss=0.0001385, whisper_loss=0.06873, over 16483.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01056, ecapa_loss=0.0001398, whisper_loss=0.08894, over 3771748.78 frames. ], batch size: 66, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:37:19,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4873780.0, ans=0.0 2024-08-20 16:37:34,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4873780.0, ans=0.125 2024-08-20 16:37:41,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4873880.0, ans=0.1 2024-08-20 16:37:46,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=4873880.0, ans=15.0 2024-08-20 16:37:49,086 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 16:37:52,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4873880.0, ans=0.125 2024-08-20 16:38:19,103 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.54 vs. limit=22.5 2024-08-20 16:38:25,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4874080.0, ans=0.125 2024-08-20 16:38:35,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4874080.0, ans=0.125 2024-08-20 16:38:59,993 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 19 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-20 16:39:05,622 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 13250, loss[loss=0.1156, beats_loss=0.01009, ecapa_loss=0.0001297, whisper_loss=0.1042, over 17502.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0104, ecapa_loss=0.0001406, whisper_loss=0.08954, over 3783486.62 frames. ], batch size: 68, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:39:15,478 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 32 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-20 16:39:23,752 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 29 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-20 16:39:34,917 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-20 16:39:47,285 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.277e+01 2.601e+01 3.009e+01 4.180e+01, threshold=5.201e+01, percent-clipped=0.0 2024-08-20 16:39:50,908 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2024-08-20 16:40:16,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4874580.0, ans=0.0 2024-08-20 16:40:22,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4874580.0, ans=0.0 2024-08-20 16:40:38,524 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 18 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-20 16:40:45,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4874680.0, ans=0.125 2024-08-20 16:40:46,836 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 16 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 16:40:48,965 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 25 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-20 16:40:51,045 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 13300, loss[loss=0.09769, beats_loss=0.0136, ecapa_loss=0.0001266, whisper_loss=0.08282, over 19191.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001397, whisper_loss=0.09027, over 3813297.07 frames. ], batch size: 80, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:40:53,341 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 20 from LS+wenet, 22 from Vox, 51 fro AS 2024-08-20 16:40:56,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4874780.0, ans=0.1 2024-08-20 16:41:24,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4874880.0, ans=0.125 2024-08-20 16:41:35,708 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.07 vs. limit=15.0 2024-08-20 16:41:46,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4874980.0, ans=0.125 2024-08-20 16:42:01,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4875080.0, ans=0.125 2024-08-20 16:42:02,151 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-20 16:42:13,219 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 16:42:35,188 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 30 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 16:42:36,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4875180.0, ans=0.1 2024-08-20 16:42:40,146 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 13350, loss[loss=0.1017, beats_loss=0.008514, ecapa_loss=0.0001236, whisper_loss=0.09194, over 22565.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01047, ecapa_loss=0.00014, whisper_loss=0.08961, over 3838491.61 frames. ], batch size: 84, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:42:55,495 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 27 from LS+wenet, 34 from Vox, 30 fro AS 2024-08-20 16:43:22,782 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.83 vs. limit=15.0 2024-08-20 16:43:23,133 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.248e+01 2.436e+01 2.816e+01 2.858e+02, threshold=4.871e+01, percent-clipped=3.0 2024-08-20 16:44:17,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4875680.0, ans=0.125 2024-08-20 16:44:21,615 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 25 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-20 16:44:23,403 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 24 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-20 16:44:34,398 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 13400, loss[loss=0.09914, beats_loss=0.00938, ecapa_loss=0.0001457, whisper_loss=0.08831, over 20265.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01044, ecapa_loss=0.0001409, whisper_loss=0.08901, over 3806360.08 frames. ], batch size: 82, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:44:51,500 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 25 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-20 16:45:36,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4875980.0, ans=0.125 2024-08-20 16:45:42,160 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.10 vs. limit=15.0 2024-08-20 16:46:00,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4876080.0, ans=0.0 2024-08-20 16:46:20,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4876180.0, ans=0.125 2024-08-20 16:46:32,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4876280.0, ans=0.1 2024-08-20 16:46:33,333 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 13450, loss[loss=0.1086, beats_loss=0.009057, ecapa_loss=0.0001525, whisper_loss=0.09806, over 16472.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01036, ecapa_loss=0.0001411, whisper_loss=0.0894, over 3785536.47 frames. ], batch size: 66, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:46:33,582 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 24 from LS+wenet, 18 from Vox, 50 fro AS 2024-08-20 16:46:34,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4876280.0, ans=0.0 2024-08-20 16:46:47,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4876280.0, ans=0.1 2024-08-20 16:47:10,180 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.03 vs. limit=15.0 2024-08-20 16:47:19,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4876480.0, ans=0.2 2024-08-20 16:47:20,307 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.341e+01 2.518e+01 2.794e+01 2.882e+02, threshold=5.035e+01, percent-clipped=1.0 2024-08-20 16:47:47,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4876580.0, ans=0.2 2024-08-20 16:47:48,768 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 20 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 16:48:04,800 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 25 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-20 16:48:11,286 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 8 from LS+wenet, 27 from Vox, 17 fro AS 2024-08-20 16:48:25,851 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 13500, loss[loss=0.09274, beats_loss=0.01321, ecapa_loss=9.842e-05, whisper_loss=0.07854, over 17342.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01038, ecapa_loss=0.0001404, whisper_loss=0.08914, over 3804182.29 frames. ], batch size: 67, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:48:26,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4876780.0, ans=0.0 2024-08-20 16:48:37,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4876780.0, ans=0.0 2024-08-20 16:48:57,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4876880.0, ans=0.125 2024-08-20 16:49:12,433 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 23 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-20 16:50:02,747 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.28 vs. limit=10.0 2024-08-20 16:50:03,338 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 26 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 16:50:14,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4877180.0, ans=10.0 2024-08-20 16:50:16,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4877180.0, ans=0.0 2024-08-20 16:50:21,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4877280.0, ans=0.1 2024-08-20 16:50:21,771 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 13550, loss[loss=0.07789, beats_loss=0.01129, ecapa_loss=0.0001594, whisper_loss=0.06501, over 14093.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01044, ecapa_loss=0.0001392, whisper_loss=0.08897, over 3814567.80 frames. ], batch size: 61, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:50:27,391 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 26 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-20 16:50:34,647 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 23 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-20 16:50:53,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4877380.0, ans=0.125 2024-08-20 16:50:54,437 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 16:51:04,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4877380.0, ans=0.0 2024-08-20 16:51:10,946 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.306e+01 2.534e+01 2.808e+01 5.425e+01, threshold=5.068e+01, percent-clipped=1.0 2024-08-20 16:51:24,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4877480.0, ans=0.125 2024-08-20 16:51:25,631 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 20 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-20 16:51:28,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4877480.0, ans=0.125 2024-08-20 16:51:35,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4877580.0, ans=0.0 2024-08-20 16:52:02,707 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2024-08-20 16:52:17,682 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2024-08-20 16:52:23,273 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 13600, loss[loss=0.1099, beats_loss=0.01254, ecapa_loss=0.0001318, whisper_loss=0.09604, over 22809.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01042, ecapa_loss=0.0001399, whisper_loss=0.08905, over 3785608.58 frames. ], batch size: 91, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:52:27,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4877780.0, ans=0.125 2024-08-20 16:52:52,450 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 14 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-20 16:52:56,645 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 14 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-20 16:52:58,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4877880.0, ans=0.125 2024-08-20 16:53:14,374 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 16:53:25,336 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.81 vs. limit=15.0 2024-08-20 16:53:25,835 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 16 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-20 16:53:41,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4878080.0, ans=0.0 2024-08-20 16:54:07,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4878180.0, ans=0.1 2024-08-20 16:54:24,385 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 13650, loss[loss=0.09247, beats_loss=0.009913, ecapa_loss=0.0001642, whisper_loss=0.08091, over 20768.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01044, ecapa_loss=0.0001396, whisper_loss=0.08925, over 3790778.07 frames. ], batch size: 86, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:55:11,583 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.380e+01 2.594e+01 2.939e+01 1.944e+02, threshold=5.188e+01, percent-clipped=3.0 2024-08-20 16:55:20,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4878480.0, ans=0.1 2024-08-20 16:55:21,163 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.59 vs. limit=15.0 2024-08-20 16:55:30,478 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2024-08-20 16:55:31,189 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 15 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 16:55:39,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4878580.0, ans=0.2 2024-08-20 16:55:45,801 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 28 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-20 16:55:50,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4878580.0, ans=0.025 2024-08-20 16:55:55,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4878580.0, ans=0.95 2024-08-20 16:55:55,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4878580.0, ans=0.2 2024-08-20 16:56:23,406 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 13700, loss[loss=0.1053, beats_loss=0.01181, ecapa_loss=0.0001484, whisper_loss=0.09201, over 21529.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01048, ecapa_loss=0.0001403, whisper_loss=0.08847, over 3752613.05 frames. ], batch size: 89, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:56:25,273 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 32 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-20 16:56:30,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4878780.0, ans=0.125 2024-08-20 16:56:32,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4878780.0, ans=0.125 2024-08-20 16:56:38,383 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 40 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-20 16:56:41,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4878780.0, ans=0.125 2024-08-20 16:57:44,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4879080.0, ans=0.1 2024-08-20 16:57:55,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4879180.0, ans=0.1 2024-08-20 16:57:57,827 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 29 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 16:58:17,615 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 13750, loss[loss=0.09485, beats_loss=0.008698, ecapa_loss=0.0001579, whisper_loss=0.08457, over 21187.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01043, ecapa_loss=0.0001405, whisper_loss=0.08879, over 3781551.66 frames. ], batch size: 87, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:58:56,615 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 23 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-20 16:58:57,937 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 16:59:03,516 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.268e+01 2.560e+01 2.819e+01 5.576e+01, threshold=5.121e+01, percent-clipped=1.0 2024-08-20 16:59:24,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4879480.0, ans=0.125 2024-08-20 16:59:58,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4879680.0, ans=0.125 2024-08-20 16:59:59,066 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 26 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-20 17:00:15,484 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 13800, loss[loss=0.1071, beats_loss=0.01245, ecapa_loss=0.0001163, whisper_loss=0.09348, over 12998.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01045, ecapa_loss=0.00014, whisper_loss=0.08885, over 3776066.53 frames. ], batch size: 51, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:00:39,910 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.35 vs. limit=12.0 2024-08-20 17:00:45,816 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 17:01:03,304 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 16 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-20 17:01:23,088 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 23 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-20 17:01:55,324 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 18 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-20 17:02:06,880 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 17 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 17:02:14,267 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 13850, loss[loss=0.1113, beats_loss=0.009823, ecapa_loss=0.0001507, whisper_loss=0.09992, over 21474.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01045, ecapa_loss=0.0001396, whisper_loss=0.08895, over 3772381.02 frames. ], batch size: 86, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:02:32,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4880280.0, ans=0.0 2024-08-20 17:02:38,294 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 30 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-20 17:02:40,872 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 21 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 17:02:45,915 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 17:02:50,912 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 22 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-20 17:03:01,946 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.552e+01 2.225e+01 2.417e+01 2.709e+01 3.540e+01, threshold=4.834e+01, percent-clipped=0.0 2024-08-20 17:03:31,505 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 14 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 17:03:35,050 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.64 vs. limit=15.0 2024-08-20 17:03:35,957 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 18 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-20 17:03:47,927 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 19 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-20 17:03:50,035 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 19 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-20 17:04:10,039 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-08-20 17:04:10,312 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 13900, loss[loss=0.1183, beats_loss=0.00839, ecapa_loss=0.0001138, whisper_loss=0.1088, over 18645.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01051, ecapa_loss=0.0001393, whisper_loss=0.08862, over 3776283.56 frames. ], batch size: 69, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:04:15,204 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 27 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 17:04:22,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4880780.0, ans=0.2 2024-08-20 17:04:23,910 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 21 from LS+wenet, 18 from Vox, 51 fro AS 2024-08-20 17:04:27,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4880780.0, ans=0.125 2024-08-20 17:04:37,747 WARNING [optim.py:496] (3/4) Scaling gradients by 0.016832223162055016, model_norm_threshold=48.33732604980469 2024-08-20 17:04:37,913 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.526e+05, grad_sumsq=7.526e+05, orig_rms_sq=1.000e+00 2024-08-20 17:04:57,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4880980.0, ans=0.0 2024-08-20 17:05:07,090 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2024-08-20 17:05:24,833 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 17:05:36,271 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.74 vs. limit=15.0 2024-08-20 17:05:55,304 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 20 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-20 17:05:57,360 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 13950, loss[loss=0.09586, beats_loss=0.01192, ecapa_loss=0.0001761, whisper_loss=0.08218, over 15659.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01063, ecapa_loss=0.0001397, whisper_loss=0.08858, over 3771668.43 frames. ], batch size: 67, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:06:24,276 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2024-08-20 17:06:37,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4881380.0, ans=0.2 2024-08-20 17:06:40,054 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.333e+01 2.602e+01 2.929e+01 2.872e+03, threshold=5.204e+01, percent-clipped=2.0 2024-08-20 17:06:44,217 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 37 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 17:06:46,501 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 28 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-20 17:06:50,643 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 24 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-20 17:07:05,722 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 18 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 17:07:18,220 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.34 vs. limit=15.0 2024-08-20 17:07:34,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4881680.0, ans=0.125 2024-08-20 17:07:42,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4881780.0, ans=0.0 2024-08-20 17:07:43,296 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 14000, loss[loss=0.09773, beats_loss=0.01067, ecapa_loss=0.0001585, whisper_loss=0.08548, over 14533.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01059, ecapa_loss=0.0001402, whisper_loss=0.08925, over 3798321.75 frames. ], batch size: 58, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:07:45,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4881780.0, ans=0.04949747468305833 2024-08-20 17:08:14,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4881880.0, ans=0.0 2024-08-20 17:08:27,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.54 vs. limit=12.0 2024-08-20 17:09:17,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4882180.0, ans=0.0 2024-08-20 17:09:19,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4882180.0, ans=0.09899494936611666 2024-08-20 17:09:33,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4882180.0, ans=0.125 2024-08-20 17:09:33,930 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.07 vs. limit=15.0 2024-08-20 17:09:37,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4882280.0, ans=0.2 2024-08-20 17:09:38,435 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 14050, loss[loss=0.116, beats_loss=0.008218, ecapa_loss=0.0001311, whisper_loss=0.1065, over 20978.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.0105, ecapa_loss=0.0001394, whisper_loss=0.08894, over 3768068.76 frames. ], batch size: 79, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:09:47,609 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 21 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-20 17:10:13,616 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 22 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-20 17:10:25,484 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.662e+01 2.229e+01 2.456e+01 2.749e+01 5.293e+01, threshold=4.913e+01, percent-clipped=1.0 2024-08-20 17:10:49,782 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 23 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 17:11:18,393 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 20 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-20 17:11:36,466 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 14100, loss[loss=0.1069, beats_loss=0.01116, ecapa_loss=0.0001108, whisper_loss=0.09464, over 23583.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01052, ecapa_loss=0.0001383, whisper_loss=0.08949, over 3775353.73 frames. ], batch size: 90, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:11:40,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4882780.0, ans=0.09899494936611666 2024-08-20 17:11:48,346 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 26 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-20 17:12:25,052 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 34 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-20 17:12:26,786 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.48 vs. limit=22.5 2024-08-20 17:12:43,554 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 22 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-20 17:12:54,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=4883080.0, ans=15.0 2024-08-20 17:13:05,548 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.46 vs. limit=15.0 2024-08-20 17:13:15,476 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 23 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-20 17:13:16,655 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.14 vs. limit=15.0 2024-08-20 17:13:23,690 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=15.0 2024-08-20 17:13:26,079 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.99 vs. limit=6.0 2024-08-20 17:13:33,715 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 14150, loss[loss=0.1101, beats_loss=0.009487, ecapa_loss=0.00019, whisper_loss=0.09868, over 14803.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01051, ecapa_loss=0.0001374, whisper_loss=0.08952, over 3776940.82 frames. ], batch size: 61, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:13:39,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4883280.0, ans=0.05 2024-08-20 17:13:41,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4883280.0, ans=0.0 2024-08-20 17:13:45,292 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 22 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-20 17:13:58,171 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 28 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 17:14:18,496 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.324e+01 2.479e+01 2.720e+01 4.062e+01, threshold=4.958e+01, percent-clipped=0.0 2024-08-20 17:14:32,274 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 26 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-20 17:14:34,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4883480.0, ans=0.125 2024-08-20 17:14:52,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4883580.0, ans=0.1 2024-08-20 17:14:52,722 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.06 vs. limit=15.0 2024-08-20 17:15:25,199 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 14200, loss[loss=0.08291, beats_loss=0.01086, ecapa_loss=0.0001391, whisper_loss=0.07067, over 16345.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.0001373, whisper_loss=0.09034, over 3799466.55 frames. ], batch size: 66, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:15:31,200 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 22 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-20 17:15:34,692 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 17:15:50,739 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 34 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-20 17:16:01,794 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 20 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-20 17:16:07,552 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2024-08-20 17:16:11,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4883980.0, ans=0.0 2024-08-20 17:16:23,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4883980.0, ans=0.0 2024-08-20 17:16:27,644 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.63 vs. limit=15.0 2024-08-20 17:16:42,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4884080.0, ans=0.125 2024-08-20 17:16:44,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=4884080.0, ans=0.2 2024-08-20 17:16:45,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4884080.0, ans=0.125 2024-08-20 17:17:09,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4884280.0, ans=0.2 2024-08-20 17:17:10,857 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 14250, loss[loss=0.1388, beats_loss=0.009298, ecapa_loss=0.0001033, whisper_loss=0.1285, over 14812.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01041, ecapa_loss=0.0001385, whisper_loss=0.09024, over 3804106.20 frames. ], batch size: 55, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:17:24,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4884280.0, ans=0.0 2024-08-20 17:17:27,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4884280.0, ans=0.0 2024-08-20 17:17:27,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4884280.0, ans=0.0 2024-08-20 17:17:53,440 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.287e+01 2.497e+01 2.839e+01 4.280e+02, threshold=4.993e+01, percent-clipped=1.0 2024-08-20 17:17:55,464 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 22 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-20 17:18:06,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4884480.0, ans=0.1 2024-08-20 17:18:11,140 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=15.0 2024-08-20 17:18:17,343 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 17:18:52,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=4884780.0, ans=0.5 2024-08-20 17:18:53,726 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 14300, loss[loss=0.09764, beats_loss=0.01118, ecapa_loss=0.0001623, whisper_loss=0.08483, over 20633.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01041, ecapa_loss=0.0001382, whisper_loss=0.09076, over 3804473.37 frames. ], batch size: 88, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:18:57,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4884780.0, ans=0.1 2024-08-20 17:19:26,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4884880.0, ans=0.125 2024-08-20 17:20:15,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4885080.0, ans=0.125 2024-08-20 17:20:20,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4885180.0, ans=0.125 2024-08-20 17:20:23,847 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 26 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 17:20:38,295 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 14350, loss[loss=0.1076, beats_loss=0.008688, ecapa_loss=0.000186, whisper_loss=0.09703, over 15186.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01032, ecapa_loss=0.0001388, whisper_loss=0.09116, over 3786281.51 frames. ], batch size: 63, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:20:51,712 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.84 vs. limit=8.0 2024-08-20 17:20:58,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4885380.0, ans=0.125 2024-08-20 17:21:19,473 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.614e+01 2.424e+01 2.742e+01 3.115e+01 1.804e+02, threshold=5.484e+01, percent-clipped=2.0 2024-08-20 17:21:20,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4885480.0, ans=0.125 2024-08-20 17:21:21,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4885480.0, ans=0.125 2024-08-20 17:21:34,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4885480.0, ans=0.2 2024-08-20 17:21:39,904 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 23 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-20 17:21:45,806 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 17 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-20 17:21:48,558 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 17 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-20 17:21:59,574 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 24 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-20 17:22:04,145 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 35 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 17:22:09,517 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 16 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 17:22:14,909 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 15 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-20 17:22:18,668 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 14400, loss[loss=0.121, beats_loss=0.009934, ecapa_loss=0.00014, whisper_loss=0.1096, over 15803.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01038, ecapa_loss=0.0001386, whisper_loss=0.0904, over 3756466.50 frames. ], batch size: 66, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:22:36,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4885780.0, ans=0.0 2024-08-20 17:22:41,428 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.014e+05 2024-08-20 17:22:58,789 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 23 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-20 17:23:05,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4885980.0, ans=0.125 2024-08-20 17:23:42,029 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 25 from LS+wenet, 26 from Vox, 20 fro AS 2024-08-20 17:23:48,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4886180.0, ans=0.0 2024-08-20 17:23:57,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4886280.0, ans=0.125 2024-08-20 17:23:58,246 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 14450, loss[loss=0.08501, beats_loss=0.01105, ecapa_loss=0.0001283, whisper_loss=0.07268, over 15062.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01037, ecapa_loss=0.000139, whisper_loss=0.08953, over 3708893.10 frames. ], batch size: 58, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:24:00,080 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 35 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-20 17:24:13,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4886280.0, ans=0.125 2024-08-20 17:24:15,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=4886280.0, ans=0.5 2024-08-20 17:24:26,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4886380.0, ans=0.125 2024-08-20 17:24:29,640 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 20 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-20 17:24:30,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4886380.0, ans=0.035 2024-08-20 17:24:33,076 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.17 vs. limit=15.0 2024-08-20 17:24:41,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4886480.0, ans=0.125 2024-08-20 17:24:41,902 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.260e+01 2.488e+01 2.810e+01 3.938e+01, threshold=4.976e+01, percent-clipped=0.0 2024-08-20 17:24:42,156 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 22 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-20 17:24:43,307 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.82 vs. limit=15.0 2024-08-20 17:24:57,599 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-20 17:25:40,962 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 14500, loss[loss=0.1015, beats_loss=0.01003, ecapa_loss=0.0001598, whisper_loss=0.08983, over 21912.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01033, ecapa_loss=0.000139, whisper_loss=0.09002, over 3755648.50 frames. ], batch size: 88, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:25:50,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4886780.0, ans=0.125 2024-08-20 17:26:45,731 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 26 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-20 17:27:25,745 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 14550, loss[loss=0.1016, beats_loss=0.009798, ecapa_loss=0.0001459, whisper_loss=0.09038, over 23444.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01023, ecapa_loss=0.0001395, whisper_loss=0.09033, over 3765642.09 frames. ], batch size: 95, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:27:40,348 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 21 from LS+wenet, 26 from Vox, 19 fro AS 2024-08-20 17:27:49,633 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 21 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-20 17:28:11,721 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.302e+01 2.517e+01 2.766e+01 3.665e+01, threshold=5.035e+01, percent-clipped=0.0 2024-08-20 17:28:19,324 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.79 vs. limit=15.0 2024-08-20 17:29:11,118 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 13 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 17:29:15,855 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 14600, loss[loss=0.09834, beats_loss=0.01095, ecapa_loss=9.124e-05, whisper_loss=0.08648, over 21437.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01027, ecapa_loss=0.0001386, whisper_loss=0.08995, over 3770966.15 frames. ], batch size: 81, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:29:22,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4887780.0, ans=0.125 2024-08-20 17:29:25,359 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.34 vs. limit=15.0 2024-08-20 17:29:34,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4887780.0, ans=0.125 2024-08-20 17:29:37,082 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.90 vs. limit=12.0 2024-08-20 17:30:07,196 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 18 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 17:30:40,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4888080.0, ans=0.125 2024-08-20 17:31:02,582 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 14650, loss[loss=0.08974, beats_loss=0.01068, ecapa_loss=0.0001235, whisper_loss=0.07782, over 21261.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01031, ecapa_loss=0.0001395, whisper_loss=0.08916, over 3769187.30 frames. ], batch size: 83, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:31:14,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4888280.0, ans=0.0 2024-08-20 17:31:46,259 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 16 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-20 17:31:48,221 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.228e+01 2.454e+01 2.785e+01 6.684e+01, threshold=4.907e+01, percent-clipped=2.0 2024-08-20 17:31:51,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4888480.0, ans=0.125 2024-08-20 17:32:12,691 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 25 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-20 17:32:20,624 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2024-08-20 17:32:31,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4888680.0, ans=0.1 2024-08-20 17:32:34,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4888680.0, ans=0.125 2024-08-20 17:32:40,075 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 24 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 17:32:44,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4888680.0, ans=0.125 2024-08-20 17:32:47,504 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 17:32:51,508 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 14700, loss[loss=0.1116, beats_loss=0.008456, ecapa_loss=0.0001482, whisper_loss=0.1017, over 22799.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0103, ecapa_loss=0.0001386, whisper_loss=0.08999, over 3799349.94 frames. ], batch size: 90, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:33:05,848 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.79 vs. limit=15.0 2024-08-20 17:33:15,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4888880.0, ans=0.1 2024-08-20 17:33:50,047 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.11 vs. limit=15.0 2024-08-20 17:33:52,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4889080.0, ans=0.0 2024-08-20 17:33:56,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4889080.0, ans=0.125 2024-08-20 17:34:04,268 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=15.0 2024-08-20 17:34:13,699 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.88 vs. limit=15.0 2024-08-20 17:34:16,828 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=15.0 2024-08-20 17:34:34,574 INFO [train_multi_KD3.py:1117] (3/4) Epoch 33, batch 14750, loss[loss=0.09532, beats_loss=0.007334, ecapa_loss=0.0001297, whisper_loss=0.08669, over 17238.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01033, ecapa_loss=0.000139, whisper_loss=0.08976, over 3786066.09 frames. ], batch size: 63, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:34:48,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4889280.0, ans=0.125 2024-08-20 17:35:17,775 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.334e+01 2.652e+01 2.948e+01 4.454e+01, threshold=5.304e+01, percent-clipped=0.0 2024-08-20 17:35:23,199 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.97 vs. limit=15.0 2024-08-20 17:35:56,726 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 17:36:34,081 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 0, loss[loss=0.09858, beats_loss=0.009754, ecapa_loss=0.0001429, whisper_loss=0.0874, over 19299.00 frames. ], tot_loss[loss=0.09858, beats_loss=0.009754, ecapa_loss=0.0001429, whisper_loss=0.0874, over 19299.00 frames. ], batch size: 76, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:36:34,082 INFO [train_multi_KD3.py:1140] (3/4) Computing validation loss 2024-08-20 17:37:09,453 INFO [train_multi_KD3.py:1150] (3/4) Epoch 34, validation on ASR_libri: loss=0.2546, beats_loss=0, ecapa_loss=0.0005123, whisper_loss=0.2495, over 931116.00 frames. 2024-08-20 17:37:22,365 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.3684, 4.2562, 3.5506, 3.7485], device='cuda:3') 2024-08-20 17:37:31,857 INFO [train_multi_KD3.py:1150] (3/4) Epoch 34, validation on SV_voxceleb1: loss=0.004, beats_loss=0, ecapa_loss=0.0004, whisper_loss=0, over 944235.00 frames. 2024-08-20 17:39:14,593 INFO [train_multi_KD3.py:1150] (3/4) Epoch 34, validation on AT_audioset: loss=0.02306, beats_loss=0.02306, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 17:39:14,596 INFO [train_multi_KD3.py:1156] (3/4) Maximum memory allocated so far is 32606MB 2024-08-20 17:39:16,248 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 30 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-20 17:39:37,900 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.74 vs. limit=15.0 2024-08-20 17:39:46,202 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 29 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-20 17:40:35,814 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-20 17:40:46,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4889990.0, ans=0.125 2024-08-20 17:41:06,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4890090.0, ans=0.125 2024-08-20 17:41:19,467 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 50, loss[loss=0.07779, beats_loss=0.0115, ecapa_loss=0.0001692, whisper_loss=0.0646, over 21133.00 frames. ], tot_loss[loss=0.09907, beats_loss=0.009549, ecapa_loss=0.000142, whisper_loss=0.0881, over 897115.88 frames. ], batch size: 90, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:41:21,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4890190.0, ans=0.0 2024-08-20 17:41:21,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=4890190.0, ans=15.0 2024-08-20 17:41:50,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4890290.0, ans=0.0 2024-08-20 17:42:33,752 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.429e+01 2.698e+01 2.927e+01 5.810e+01, threshold=5.396e+01, percent-clipped=1.0 2024-08-20 17:42:48,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4890490.0, ans=0.0 2024-08-20 17:42:53,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4890490.0, ans=0.125 2024-08-20 17:43:03,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4890590.0, ans=0.0 2024-08-20 17:43:03,559 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.63 vs. limit=10.0 2024-08-20 17:43:15,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4890590.0, ans=0.125 2024-08-20 17:43:23,666 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 100, loss[loss=0.1006, beats_loss=0.007419, ecapa_loss=0.0001358, whisper_loss=0.09182, over 16472.00 frames. ], tot_loss[loss=0.09946, beats_loss=0.009362, ecapa_loss=0.0001409, whisper_loss=0.08869, over 1510467.95 frames. ], batch size: 64, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:43:28,590 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2024-08-20 17:43:41,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4890690.0, ans=0.0 2024-08-20 17:43:46,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=4890690.0, ans=10.0 2024-08-20 17:44:02,278 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-08-20 17:44:09,925 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.16 vs. limit=22.5 2024-08-20 17:44:15,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4890890.0, ans=0.07 2024-08-20 17:44:21,394 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 21 from LS+wenet, 10 from Vox, 22 fro AS 2024-08-20 17:44:42,328 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 22 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-20 17:45:30,984 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 150, loss[loss=0.1131, beats_loss=0.01127, ecapa_loss=0.0001253, whisper_loss=0.1006, over 15137.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.009208, ecapa_loss=0.0001394, whisper_loss=0.08961, over 1996268.01 frames. ], batch size: 59, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:45:36,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=4891190.0, ans=0.5 2024-08-20 17:46:07,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4891290.0, ans=0.125 2024-08-20 17:46:14,070 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 27 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-20 17:46:23,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4891390.0, ans=0.1 2024-08-20 17:46:35,252 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.414e+01 2.624e+01 2.993e+01 4.090e+01, threshold=5.247e+01, percent-clipped=0.0 2024-08-20 17:46:44,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4891490.0, ans=0.125 2024-08-20 17:46:54,422 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2024-08-20 17:46:56,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4891590.0, ans=0.125 2024-08-20 17:46:57,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4891590.0, ans=0.0 2024-08-20 17:47:15,757 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 200, loss[loss=0.1099, beats_loss=0.009462, ecapa_loss=0.0001316, whisper_loss=0.09916, over 18688.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.009378, ecapa_loss=0.000141, whisper_loss=0.0908, over 2433593.65 frames. ], batch size: 73, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:47:28,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4891690.0, ans=0.1 2024-08-20 17:47:37,729 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 19 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 17:48:07,315 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 26 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-20 17:48:17,946 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.43 vs. limit=22.5 2024-08-20 17:48:20,692 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-20 17:48:23,051 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 15 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 17:48:23,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4891990.0, ans=0.07 2024-08-20 17:48:50,913 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 250, loss[loss=0.09825, beats_loss=0.01018, ecapa_loss=0.0001684, whisper_loss=0.08639, over 16168.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.009557, ecapa_loss=0.0001414, whisper_loss=0.0908, over 2737307.45 frames. ], batch size: 65, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:49:06,001 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-08-20 17:49:09,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4892290.0, ans=0.0 2024-08-20 17:49:21,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4892290.0, ans=0.2 2024-08-20 17:49:26,037 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 22 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-20 17:49:27,925 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-20 17:49:30,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4892390.0, ans=0.2 2024-08-20 17:49:43,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4892390.0, ans=0.125 2024-08-20 17:49:48,358 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.222e+01 2.433e+01 2.766e+01 4.202e+01, threshold=4.866e+01, percent-clipped=0.0 2024-08-20 17:49:56,821 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.80 vs. limit=15.0 2024-08-20 17:50:23,575 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 300, loss[loss=0.08957, beats_loss=0.007881, ecapa_loss=0.0001179, whisper_loss=0.08051, over 15971.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.009818, ecapa_loss=0.0001389, whisper_loss=0.09015, over 2945650.22 frames. ], batch size: 58, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:50:33,063 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 24 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-20 17:50:34,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4892690.0, ans=0.0 2024-08-20 17:51:08,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4892890.0, ans=0.125 2024-08-20 17:51:20,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4892990.0, ans=0.1 2024-08-20 17:51:30,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4892990.0, ans=0.0 2024-08-20 17:51:55,174 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 350, loss[loss=0.09535, beats_loss=0.01127, ecapa_loss=0.0001069, whisper_loss=0.08301, over 21889.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.009893, ecapa_loss=0.0001392, whisper_loss=0.09042, over 3138693.05 frames. ], batch size: 85, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:52:09,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4893190.0, ans=0.125 2024-08-20 17:52:16,226 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 33 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 17:52:20,961 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.42 vs. limit=15.0 2024-08-20 17:52:21,634 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 24 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-20 17:52:27,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4893290.0, ans=0.125 2024-08-20 17:52:46,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4893390.0, ans=0.1 2024-08-20 17:52:48,833 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.726e+01 2.330e+01 2.558e+01 2.874e+01 1.855e+02, threshold=5.116e+01, percent-clipped=1.0 2024-08-20 17:52:50,796 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 16 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-20 17:52:58,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4893490.0, ans=0.0 2024-08-20 17:53:07,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4893590.0, ans=0.125 2024-08-20 17:53:08,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4893590.0, ans=0.0 2024-08-20 17:53:17,223 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 27 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-20 17:53:22,325 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 15 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-20 17:53:26,064 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 400, loss[loss=0.0829, beats_loss=0.01074, ecapa_loss=0.0001604, whisper_loss=0.07056, over 13391.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01003, ecapa_loss=0.0001381, whisper_loss=0.089, over 3233122.63 frames. ], batch size: 56, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:53:39,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4893690.0, ans=0.2 2024-08-20 17:53:46,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4893790.0, ans=0.1 2024-08-20 17:54:01,094 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 17:54:01,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4893890.0, ans=0.125 2024-08-20 17:54:22,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4893990.0, ans=0.125 2024-08-20 17:54:34,451 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 32 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-20 17:54:39,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4894090.0, ans=0.1 2024-08-20 17:54:55,810 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 450, loss[loss=0.09636, beats_loss=0.01189, ecapa_loss=0.0001238, whisper_loss=0.08323, over 21985.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01009, ecapa_loss=0.0001384, whisper_loss=0.08976, over 3378060.27 frames. ], batch size: 89, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:54:56,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4894190.0, ans=0.125 2024-08-20 17:55:01,807 WARNING [optim.py:496] (3/4) Scaling gradients by 0.014215901494026184, model_norm_threshold=51.16255569458008 2024-08-20 17:55:01,975 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.649e+06, grad_sumsq=5.014e+05, orig_rms_sq=3.288e+00 2024-08-20 17:55:22,929 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.03 vs. limit=15.0 2024-08-20 17:55:26,987 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 24 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-20 17:55:37,788 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 16 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-20 17:55:50,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4894490.0, ans=0.125 2024-08-20 17:55:51,327 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.729e+01 2.355e+01 2.533e+01 2.775e+01 3.599e+03, threshold=5.067e+01, percent-clipped=2.0 2024-08-20 17:55:54,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4894490.0, ans=0.125 2024-08-20 17:56:03,200 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.44 vs. limit=10.0 2024-08-20 17:56:15,876 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 18 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 17:56:27,834 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 500, loss[loss=0.07529, beats_loss=0.011, ecapa_loss=0.0001333, whisper_loss=0.06296, over 16153.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01015, ecapa_loss=0.0001382, whisper_loss=0.08982, over 3491280.95 frames. ], batch size: 65, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:56:34,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4894690.0, ans=0.025 2024-08-20 17:56:42,918 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 17:56:48,127 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 19 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-20 17:57:20,636 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 25 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-20 17:57:31,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4894990.0, ans=0.0 2024-08-20 17:57:41,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4895090.0, ans=0.0 2024-08-20 17:57:58,251 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 550, loss[loss=0.1181, beats_loss=0.009615, ecapa_loss=0.0001171, whisper_loss=0.1073, over 20231.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01019, ecapa_loss=0.000137, whisper_loss=0.08996, over 3544104.56 frames. ], batch size: 75, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:58:03,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4895190.0, ans=0.125 2024-08-20 17:58:12,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4895190.0, ans=10.0 2024-08-20 17:58:25,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.whiten.whitening_limit, batch_count=4895290.0, ans=12.0 2024-08-20 17:58:52,573 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.257e+01 2.490e+01 2.718e+01 3.602e+01, threshold=4.981e+01, percent-clipped=0.0 2024-08-20 17:58:53,018 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 24 from LS+wenet, 14 from Vox, 16 fro AS 2024-08-20 17:58:59,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4895490.0, ans=0.0 2024-08-20 17:59:12,740 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 35 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 17:59:23,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4895590.0, ans=0.125 2024-08-20 17:59:23,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4895590.0, ans=0.125 2024-08-20 17:59:27,724 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 600, loss[loss=0.08625, beats_loss=0.01121, ecapa_loss=0.0001247, whisper_loss=0.0738, over 22293.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01012, ecapa_loss=0.0001376, whisper_loss=0.09028, over 3626640.85 frames. ], batch size: 88, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:59:40,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4895690.0, ans=0.125 2024-08-20 17:59:53,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4895790.0, ans=0.2 2024-08-20 17:59:58,400 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.76 vs. limit=15.0 2024-08-20 18:00:01,034 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 20 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 18:00:14,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4895890.0, ans=0.125 2024-08-20 18:00:21,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4895990.0, ans=0.0 2024-08-20 18:00:28,858 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 21 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 18:00:52,682 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.86 vs. limit=22.5 2024-08-20 18:00:56,856 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 650, loss[loss=0.09718, beats_loss=0.01091, ecapa_loss=0.0001496, whisper_loss=0.08478, over 20802.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01019, ecapa_loss=0.0001369, whisper_loss=0.08931, over 3640041.71 frames. ], batch size: 84, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:01:05,312 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.01 vs. limit=22.5 2024-08-20 18:01:16,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4896290.0, ans=0.05 2024-08-20 18:01:17,335 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-20 18:01:30,132 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 28 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-20 18:01:36,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4896390.0, ans=0.1 2024-08-20 18:01:51,475 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.174e+01 2.397e+01 2.738e+01 4.303e+01, threshold=4.793e+01, percent-clipped=0.0 2024-08-20 18:01:52,009 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 20 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-20 18:01:57,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4896490.0, ans=0.1 2024-08-20 18:02:12,441 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.25 vs. limit=15.0 2024-08-20 18:02:17,731 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.77 vs. limit=22.5 2024-08-20 18:02:26,545 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 700, loss[loss=0.101, beats_loss=0.01045, ecapa_loss=0.0001464, whisper_loss=0.08906, over 22228.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01024, ecapa_loss=0.0001368, whisper_loss=0.08934, over 3685589.68 frames. ], batch size: 91, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:02:38,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4896690.0, ans=0.2 2024-08-20 18:02:43,481 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 26 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-20 18:03:24,799 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 16 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-20 18:03:29,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4896990.0, ans=0.05 2024-08-20 18:03:34,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4896990.0, ans=0.04949747468305833 2024-08-20 18:03:51,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.01 vs. limit=15.0 2024-08-20 18:03:57,682 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 750, loss[loss=0.09201, beats_loss=0.01115, ecapa_loss=0.0001443, whisper_loss=0.07941, over 22854.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01018, ecapa_loss=0.0001379, whisper_loss=0.08937, over 3705147.71 frames. ], batch size: 96, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:03:59,051 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.24 vs. limit=22.5 2024-08-20 18:04:04,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4897190.0, ans=0.125 2024-08-20 18:04:22,451 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.07 vs. limit=15.0 2024-08-20 18:04:31,858 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 21 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-20 18:04:43,813 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 26 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-20 18:04:46,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4897390.0, ans=0.125 2024-08-20 18:04:47,243 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 28 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 18:04:49,138 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 18:04:50,446 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.248e+01 2.436e+01 2.663e+01 4.558e+01, threshold=4.873e+01, percent-clipped=0.0 2024-08-20 18:05:02,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4897490.0, ans=0.125 2024-08-20 18:05:15,560 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 15 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 18:05:17,253 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 22 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-20 18:05:25,691 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 800, loss[loss=0.116, beats_loss=0.008652, ecapa_loss=0.0001814, whisper_loss=0.1056, over 19484.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01028, ecapa_loss=0.0001376, whisper_loss=0.08865, over 3715758.31 frames. ], batch size: 80, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:05:27,054 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.77 vs. limit=10.0 2024-08-20 18:05:29,773 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 21 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-20 18:05:42,444 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.468e+00 2024-08-20 18:05:47,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4897790.0, ans=0.0 2024-08-20 18:05:52,061 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.91 vs. limit=22.5 2024-08-20 18:06:08,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4897890.0, ans=0.2 2024-08-20 18:06:13,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4897890.0, ans=0.125 2024-08-20 18:06:16,526 WARNING [optim.py:496] (3/4) Scaling gradients by 0.034612394869327545, model_norm_threshold=48.72909927368164 2024-08-20 18:06:16,696 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.559e+05, grad_sumsq=3.559e+05, orig_rms_sq=1.000e+00 2024-08-20 18:06:36,057 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 24 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-20 18:06:49,784 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 20 from LS+wenet, 14 from Vox, 16 fro AS 2024-08-20 18:06:52,584 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 850, loss[loss=0.09027, beats_loss=0.00837, ecapa_loss=0.0001532, whisper_loss=0.08037, over 13868.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.0103, ecapa_loss=0.0001382, whisper_loss=0.08876, over 3692498.63 frames. ], batch size: 54, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:07:03,345 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-20 18:07:45,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4898490.0, ans=0.125 2024-08-20 18:07:45,908 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.491e+01 2.263e+01 2.466e+01 2.785e+01 1.408e+03, threshold=4.933e+01, percent-clipped=1.0 2024-08-20 18:07:51,786 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 15 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-20 18:08:11,979 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 19 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-20 18:08:20,786 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 33 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-20 18:08:22,073 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 900, loss[loss=0.1183, beats_loss=0.008703, ecapa_loss=0.0001464, whisper_loss=0.1081, over 22943.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01024, ecapa_loss=0.0001387, whisper_loss=0.08907, over 3731421.40 frames. ], batch size: 91, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:08:28,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4898690.0, ans=0.0 2024-08-20 18:09:08,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4898890.0, ans=0.1 2024-08-20 18:09:17,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4898990.0, ans=0.1 2024-08-20 18:09:20,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4898990.0, ans=0.0 2024-08-20 18:09:44,894 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.48 vs. limit=15.0 2024-08-20 18:09:52,098 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 950, loss[loss=0.08463, beats_loss=0.009477, ecapa_loss=0.0001557, whisper_loss=0.07359, over 13706.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01017, ecapa_loss=0.0001381, whisper_loss=0.08931, over 3715698.75 frames. ], batch size: 52, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:10:10,137 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.13 vs. limit=10.0 2024-08-20 18:10:29,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4899390.0, ans=0.125 2024-08-20 18:10:32,576 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.985e+00 2024-08-20 18:10:40,106 WARNING [optim.py:496] (3/4) Scaling gradients by 0.03566131740808487, model_norm_threshold=49.32598114013672 2024-08-20 18:10:40,277 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.0.norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.124e+05, grad_sumsq=3.124e+05, orig_rms_sq=1.000e+00 2024-08-20 18:10:43,562 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.227e+01 2.460e+01 2.712e+01 1.383e+03, threshold=4.920e+01, percent-clipped=1.0 2024-08-20 18:10:51,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4899490.0, ans=0.125 2024-08-20 18:10:55,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4899490.0, ans=0.125 2024-08-20 18:11:07,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4899590.0, ans=0.125 2024-08-20 18:11:14,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4899590.0, ans=0.0 2024-08-20 18:11:20,425 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 1000, loss[loss=0.1032, beats_loss=0.0115, ecapa_loss=0.0001235, whisper_loss=0.09049, over 22600.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01017, ecapa_loss=0.0001385, whisper_loss=0.08905, over 3707079.88 frames. ], batch size: 90, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:11:27,278 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.39 vs. limit=22.5 2024-08-20 18:11:28,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4899690.0, ans=0.125 2024-08-20 18:11:36,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4899690.0, ans=0.125 2024-08-20 18:11:47,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4899790.0, ans=0.0 2024-08-20 18:12:03,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4899890.0, ans=0.125 2024-08-20 18:12:05,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4899890.0, ans=0.0 2024-08-20 18:12:14,207 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 24 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-20 18:12:31,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4900090.0, ans=0.07 2024-08-20 18:12:44,836 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=12.0 2024-08-20 18:12:47,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4900090.0, ans=0.0 2024-08-20 18:12:50,727 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 1050, loss[loss=0.1281, beats_loss=0.00849, ecapa_loss=0.0001478, whisper_loss=0.1181, over 20149.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.0102, ecapa_loss=0.0001378, whisper_loss=0.08918, over 3736825.78 frames. ], batch size: 78, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:12:52,889 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.10 vs. limit=22.5 2024-08-20 18:13:11,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4900290.0, ans=0.125 2024-08-20 18:13:15,110 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 26 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-20 18:13:16,701 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 30 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-20 18:13:25,509 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 26 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-20 18:13:26,424 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.16 vs. limit=8.0 2024-08-20 18:13:32,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4900390.0, ans=0.0 2024-08-20 18:13:43,083 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.230e+01 2.407e+01 2.713e+01 3.528e+01, threshold=4.815e+01, percent-clipped=0.0 2024-08-20 18:14:17,683 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 1100, loss[loss=0.102, beats_loss=0.01161, ecapa_loss=0.0001186, whisper_loss=0.08921, over 24210.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01031, ecapa_loss=0.000137, whisper_loss=0.08861, over 3767361.13 frames. ], batch size: 93, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:14:30,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4900690.0, ans=0.125 2024-08-20 18:15:19,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4900990.0, ans=0.0 2024-08-20 18:15:24,756 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.06 vs. limit=15.0 2024-08-20 18:15:39,862 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 21 from LS+wenet, 13 from Vox, 17 fro AS 2024-08-20 18:15:44,768 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 1150, loss[loss=0.1127, beats_loss=0.007457, ecapa_loss=0.000158, whisper_loss=0.1036, over 18844.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01034, ecapa_loss=0.0001369, whisper_loss=0.08892, over 3738161.05 frames. ], batch size: 73, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:16:05,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4901290.0, ans=0.0 2024-08-20 18:16:18,137 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.213e+01 2024-08-20 18:16:38,486 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.288e+01 2.529e+01 2.855e+01 5.753e+01, threshold=5.059e+01, percent-clipped=2.0 2024-08-20 18:16:41,257 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 23 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 18:16:45,010 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 36 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-20 18:16:50,075 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 22 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 18:16:57,102 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 39 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-20 18:17:06,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4901590.0, ans=0.0 2024-08-20 18:17:14,347 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 1200, loss[loss=0.1131, beats_loss=0.01029, ecapa_loss=0.0001468, whisper_loss=0.1013, over 18716.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01024, ecapa_loss=0.0001372, whisper_loss=0.0896, over 3741950.32 frames. ], batch size: 73, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:17:49,414 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 12 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-20 18:17:54,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4901890.0, ans=0.125 2024-08-20 18:17:58,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4901890.0, ans=0.0 2024-08-20 18:18:09,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4901990.0, ans=0.125 2024-08-20 18:18:20,811 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 22 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 18:18:21,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4901990.0, ans=0.1 2024-08-20 18:18:40,291 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-20 18:18:43,504 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 1250, loss[loss=0.1025, beats_loss=0.008553, ecapa_loss=0.0001324, whisper_loss=0.0926, over 13919.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01032, ecapa_loss=0.000137, whisper_loss=0.0891, over 3742473.30 frames. ], batch size: 51, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:18:57,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4902190.0, ans=0.1 2024-08-20 18:19:11,106 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.70 vs. limit=15.0 2024-08-20 18:19:24,801 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2024-08-20 18:19:35,948 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.263e+01 2.557e+01 2.836e+01 4.039e+01, threshold=5.115e+01, percent-clipped=0.0 2024-08-20 18:19:40,793 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.52 vs. limit=15.0 2024-08-20 18:19:44,864 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 20 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-20 18:19:45,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4902490.0, ans=0.0 2024-08-20 18:19:53,968 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 13 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-20 18:20:11,668 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 1300, loss[loss=0.08364, beats_loss=0.01015, ecapa_loss=0.0001454, whisper_loss=0.07204, over 15544.00 frames. ], tot_loss[loss=0.0997, beats_loss=0.01031, ecapa_loss=0.0001371, whisper_loss=0.08803, over 3700423.30 frames. ], batch size: 61, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:20:34,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4902790.0, ans=0.125 2024-08-20 18:21:21,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4902990.0, ans=0.1 2024-08-20 18:21:30,748 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 20 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-20 18:21:41,567 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 1350, loss[loss=0.09692, beats_loss=0.008967, ecapa_loss=0.0001517, whisper_loss=0.08644, over 19911.00 frames. ], tot_loss[loss=0.09977, beats_loss=0.01034, ecapa_loss=0.0001367, whisper_loss=0.08806, over 3703825.33 frames. ], batch size: 79, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:21:47,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4903190.0, ans=0.95 2024-08-20 18:22:03,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4903290.0, ans=0.125 2024-08-20 18:22:07,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4903290.0, ans=0.0 2024-08-20 18:22:15,777 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 18:22:28,604 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 18:22:33,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4903490.0, ans=0.0 2024-08-20 18:22:34,608 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.706e+01 2.265e+01 2.449e+01 2.794e+01 7.955e+01, threshold=4.899e+01, percent-clipped=1.0 2024-08-20 18:22:58,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4903590.0, ans=0.125 2024-08-20 18:23:00,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4903590.0, ans=0.125 2024-08-20 18:23:10,312 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 1400, loss[loss=0.1115, beats_loss=0.009749, ecapa_loss=0.0001339, whisper_loss=0.1005, over 22986.00 frames. ], tot_loss[loss=0.09999, beats_loss=0.01031, ecapa_loss=0.0001369, whisper_loss=0.08831, over 3705773.32 frames. ], batch size: 89, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:23:21,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4903690.0, ans=0.0 2024-08-20 18:23:33,236 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 21 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 18:24:02,482 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2024-08-20 18:24:38,126 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 1450, loss[loss=0.1057, beats_loss=0.01005, ecapa_loss=0.0001464, whisper_loss=0.09421, over 22213.00 frames. ], tot_loss[loss=0.09893, beats_loss=0.01039, ecapa_loss=0.0001368, whisper_loss=0.08717, over 3713232.01 frames. ], batch size: 89, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:25:07,263 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 28 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-20 18:25:10,696 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 16 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-20 18:25:16,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4904390.0, ans=0.125 2024-08-20 18:25:31,728 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.37 vs. limit=10.0 2024-08-20 18:25:32,169 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.688e+01 2.130e+01 2.447e+01 2.778e+01 4.776e+01, threshold=4.894e+01, percent-clipped=0.0 2024-08-20 18:25:35,245 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.39 vs. limit=22.5 2024-08-20 18:26:06,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4904490.0, ans=0.2 2024-08-20 18:26:06,844 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.91 vs. limit=15.0 2024-08-20 18:26:08,533 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.49 vs. limit=22.5 2024-08-20 18:26:15,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4904590.0, ans=0.125 2024-08-20 18:26:32,485 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 1500, loss[loss=0.0929, beats_loss=0.01167, ecapa_loss=0.0001248, whisper_loss=0.07998, over 16673.00 frames. ], tot_loss[loss=0.09942, beats_loss=0.01038, ecapa_loss=0.0001371, whisper_loss=0.08767, over 3760863.95 frames. ], batch size: 65, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:26:39,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4904690.0, ans=0.125 2024-08-20 18:26:43,532 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.28 vs. limit=22.5 2024-08-20 18:26:47,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4904690.0, ans=0.125 2024-08-20 18:26:54,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4904790.0, ans=0.125 2024-08-20 18:27:14,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4904890.0, ans=0.0 2024-08-20 18:27:18,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4904890.0, ans=0.1 2024-08-20 18:27:30,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4904990.0, ans=0.0 2024-08-20 18:27:38,482 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=15.0 2024-08-20 18:27:41,273 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 22 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-20 18:28:00,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4905090.0, ans=0.125 2024-08-20 18:28:04,477 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2024-08-20 18:28:04,913 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 1550, loss[loss=0.1168, beats_loss=0.007069, ecapa_loss=0.0001433, whisper_loss=0.1083, over 18374.00 frames. ], tot_loss[loss=0.09944, beats_loss=0.01041, ecapa_loss=0.0001372, whisper_loss=0.08766, over 3757040.03 frames. ], batch size: 67, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:28:10,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=4905190.0, ans=0.2 2024-08-20 18:28:13,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4905190.0, ans=0.125 2024-08-20 18:28:19,216 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.096e-03 2024-08-20 18:28:43,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4905390.0, ans=0.125 2024-08-20 18:28:48,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4905390.0, ans=0.1 2024-08-20 18:28:50,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4905390.0, ans=0.0 2024-08-20 18:29:01,008 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.222e+01 2.378e+01 2.673e+01 8.948e+01, threshold=4.757e+01, percent-clipped=1.0 2024-08-20 18:29:38,725 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 1600, loss[loss=0.1156, beats_loss=0.008931, ecapa_loss=0.0001372, whisper_loss=0.1053, over 17953.00 frames. ], tot_loss[loss=0.09979, beats_loss=0.01031, ecapa_loss=0.0001371, whisper_loss=0.08811, over 3744290.63 frames. ], batch size: 70, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:30:17,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4905890.0, ans=0.125 2024-08-20 18:30:33,981 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0239420123398304, model_norm_threshold=47.56806564331055 2024-08-20 18:30:34,149 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.639e+05, grad_sumsq=5.639e+05, orig_rms_sq=1.000e+00 2024-08-20 18:30:49,764 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 26 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-20 18:30:56,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4906090.0, ans=0.0 2024-08-20 18:30:59,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4906090.0, ans=0.125 2024-08-20 18:31:02,016 WARNING [optim.py:496] (3/4) Scaling gradients by 0.05985227972269058, model_norm_threshold=47.56806564331055 2024-08-20 18:31:02,185 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.784e+04, grad_sumsq=6.784e+04, orig_rms_sq=1.000e+00 2024-08-20 18:31:07,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4906090.0, ans=0.125 2024-08-20 18:31:10,632 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 1650, loss[loss=0.1176, beats_loss=0.008279, ecapa_loss=0.0001562, whisper_loss=0.1077, over 18691.00 frames. ], tot_loss[loss=0.09971, beats_loss=0.01037, ecapa_loss=0.0001371, whisper_loss=0.08797, over 3779523.08 frames. ], batch size: 72, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:31:11,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4906190.0, ans=0.1 2024-08-20 18:31:34,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4906290.0, ans=0.0 2024-08-20 18:31:35,984 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 24 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-20 18:31:39,989 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 24 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-20 18:31:45,527 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 33 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-20 18:31:47,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4906390.0, ans=0.05 2024-08-20 18:31:58,490 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.733e+00 2024-08-20 18:32:04,562 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.417e+01 2.742e+01 3.192e+01 1.987e+03, threshold=5.484e+01, percent-clipped=2.0 2024-08-20 18:32:07,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4906490.0, ans=0.2 2024-08-20 18:32:07,553 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=15.0 2024-08-20 18:32:16,334 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.68 vs. limit=15.0 2024-08-20 18:32:21,597 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.41 vs. limit=15.0 2024-08-20 18:32:24,315 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 27 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 18:32:24,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4906590.0, ans=0.125 2024-08-20 18:32:39,583 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 1700, loss[loss=0.09784, beats_loss=0.01077, ecapa_loss=0.000119, whisper_loss=0.08588, over 19381.00 frames. ], tot_loss[loss=0.09974, beats_loss=0.01032, ecapa_loss=0.000137, whisper_loss=0.08804, over 3767112.96 frames. ], batch size: 78, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:33:04,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4906790.0, ans=0.125 2024-08-20 18:33:26,173 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 24 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-20 18:33:35,873 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 33 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 18:33:36,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4906990.0, ans=0.0 2024-08-20 18:33:39,012 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 34 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-20 18:33:45,498 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-20 18:33:54,235 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.77 vs. limit=12.0 2024-08-20 18:34:01,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4907090.0, ans=0.1 2024-08-20 18:34:02,039 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=16.76 vs. limit=15.0 2024-08-20 18:34:11,705 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 1750, loss[loss=0.08253, beats_loss=0.01327, ecapa_loss=0.0001141, whisper_loss=0.06812, over 18926.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01039, ecapa_loss=0.0001351, whisper_loss=0.0886, over 3723842.16 frames. ], batch size: 76, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:34:17,082 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 20 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 18:34:37,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4907290.0, ans=0.1 2024-08-20 18:34:39,850 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.23 vs. limit=12.0 2024-08-20 18:34:46,104 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 15 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 18:34:50,740 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.62 vs. limit=15.0 2024-08-20 18:34:55,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4907390.0, ans=0.125 2024-08-20 18:35:05,590 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.254e+01 2.590e+01 2.933e+01 3.656e+02, threshold=5.181e+01, percent-clipped=1.0 2024-08-20 18:35:39,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4907690.0, ans=0.07 2024-08-20 18:35:40,831 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 1800, loss[loss=0.08849, beats_loss=0.01248, ecapa_loss=0.0001087, whisper_loss=0.07493, over 20316.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01037, ecapa_loss=0.0001351, whisper_loss=0.08894, over 3745013.75 frames. ], batch size: 81, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:35:50,437 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 18:35:54,180 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.908e+00 2024-08-20 18:35:54,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4907690.0, ans=0.125 2024-08-20 18:36:12,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4907790.0, ans=0.125 2024-08-20 18:36:18,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4907890.0, ans=0.1 2024-08-20 18:36:20,461 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2024-08-20 18:36:25,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4907890.0, ans=0.125 2024-08-20 18:36:29,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4907890.0, ans=0.125 2024-08-20 18:36:33,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4907990.0, ans=0.125 2024-08-20 18:36:33,968 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2024-08-20 18:36:39,238 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 30 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-20 18:36:46,088 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 25 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-20 18:36:51,564 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 25 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-20 18:36:51,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4908090.0, ans=0.125 2024-08-20 18:37:09,310 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 1850, loss[loss=0.1099, beats_loss=0.008634, ecapa_loss=0.0001309, whisper_loss=0.09996, over 23028.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01048, ecapa_loss=0.0001343, whisper_loss=0.08837, over 3750158.37 frames. ], batch size: 91, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:37:24,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4908190.0, ans=0.0 2024-08-20 18:37:40,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4908290.0, ans=0.125 2024-08-20 18:37:45,741 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-20 18:37:57,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4908390.0, ans=0.125 2024-08-20 18:38:00,644 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.18 vs. limit=15.0 2024-08-20 18:38:04,907 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.237e+01 2.438e+01 2.771e+01 3.802e+01, threshold=4.876e+01, percent-clipped=0.0 2024-08-20 18:38:09,132 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.62 vs. limit=15.0 2024-08-20 18:38:14,426 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 18:38:16,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4908490.0, ans=0.125 2024-08-20 18:38:27,499 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 18 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-20 18:38:43,213 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 1900, loss[loss=0.1027, beats_loss=0.01117, ecapa_loss=0.0001281, whisper_loss=0.09024, over 17204.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01039, ecapa_loss=0.0001342, whisper_loss=0.08847, over 3749143.70 frames. ], batch size: 69, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:38:54,078 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 18:39:01,316 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-20 18:39:20,659 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.52 vs. limit=10.0 2024-08-20 18:39:23,067 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 29 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-20 18:39:23,437 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.41 vs. limit=22.5 2024-08-20 18:39:27,039 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-20 18:39:46,049 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 26 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 18:39:53,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4908990.0, ans=10.0 2024-08-20 18:40:18,055 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 1950, loss[loss=0.09658, beats_loss=0.01108, ecapa_loss=0.0001333, whisper_loss=0.08417, over 22873.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01043, ecapa_loss=0.0001342, whisper_loss=0.08834, over 3757758.14 frames. ], batch size: 90, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:40:23,144 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.875e+01 2024-08-20 18:40:41,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4909290.0, ans=0.125 2024-08-20 18:40:54,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4909390.0, ans=0.2 2024-08-20 18:41:02,054 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 18:41:14,305 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.290e+01 2.558e+01 2.755e+01 1.117e+02, threshold=5.115e+01, percent-clipped=1.0 2024-08-20 18:41:50,634 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 2000, loss[loss=0.1041, beats_loss=0.01125, ecapa_loss=0.0001523, whisper_loss=0.09129, over 16614.00 frames. ], tot_loss[loss=0.09961, beats_loss=0.01052, ecapa_loss=0.0001327, whisper_loss=0.08776, over 3710121.33 frames. ], batch size: 67, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:42:08,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4909790.0, ans=0.0 2024-08-20 18:42:28,108 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 19 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 18:42:33,464 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 36 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-20 18:43:00,131 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 20 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-20 18:43:00,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4909990.0, ans=0.125 2024-08-20 18:43:10,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4910090.0, ans=0.0 2024-08-20 18:43:19,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4910190.0, ans=0.125 2024-08-20 18:43:20,490 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 2050, loss[loss=0.08919, beats_loss=0.0133, ecapa_loss=0.000144, whisper_loss=0.07445, over 19089.00 frames. ], tot_loss[loss=0.09904, beats_loss=0.01056, ecapa_loss=0.0001324, whisper_loss=0.08716, over 3707006.40 frames. ], batch size: 79, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:43:24,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4910190.0, ans=0.0 2024-08-20 18:43:29,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=4910190.0, ans=0.1 2024-08-20 18:43:35,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4910190.0, ans=0.1 2024-08-20 18:43:39,945 INFO [train_multi_KD3.py:845] (3/4) A total of 49 cuts. 16 from LS+wenet, 7 from Vox, 26 fro AS 2024-08-20 18:43:41,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=4910290.0, ans=0.2 2024-08-20 18:43:46,276 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.69 vs. limit=15.0 2024-08-20 18:43:56,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4910390.0, ans=0.125 2024-08-20 18:43:59,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4910390.0, ans=0.0 2024-08-20 18:44:03,069 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 26 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 18:44:14,930 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.216e+01 2.451e+01 2.843e+01 3.787e+02, threshold=4.902e+01, percent-clipped=2.0 2024-08-20 18:44:27,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4910490.0, ans=0.0 2024-08-20 18:44:28,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4910490.0, ans=0.125 2024-08-20 18:44:30,268 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 18 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-20 18:44:41,383 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 19 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-20 18:44:49,003 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 2100, loss[loss=0.1167, beats_loss=0.009275, ecapa_loss=0.0001409, whisper_loss=0.106, over 22524.00 frames. ], tot_loss[loss=0.09856, beats_loss=0.01061, ecapa_loss=0.0001315, whisper_loss=0.08663, over 3738587.95 frames. ], batch size: 89, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:44:53,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4910690.0, ans=0.0 2024-08-20 18:44:58,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4910690.0, ans=0.2 2024-08-20 18:45:04,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4910690.0, ans=0.0 2024-08-20 18:45:05,256 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-20 18:45:23,884 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 22 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-20 18:45:43,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4910990.0, ans=0.09899494936611666 2024-08-20 18:45:55,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4910990.0, ans=0.0 2024-08-20 18:46:10,007 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 27 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-20 18:46:17,948 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 2150, loss[loss=0.1215, beats_loss=0.006911, ecapa_loss=0.0001356, whisper_loss=0.1132, over 14888.00 frames. ], tot_loss[loss=0.09879, beats_loss=0.01057, ecapa_loss=0.0001315, whisper_loss=0.0869, over 3730738.47 frames. ], batch size: 53, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:46:21,306 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-20 18:46:24,424 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.29 vs. limit=10.0 2024-08-20 18:46:43,211 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 34 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-20 18:47:09,447 WARNING [optim.py:496] (3/4) Scaling gradients by 0.05905143544077873, model_norm_threshold=49.024410247802734 2024-08-20 18:47:09,617 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.632e+04, grad_sumsq=6.177e+06, orig_rms_sq=1.074e-02 2024-08-20 18:47:10,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4911490.0, ans=0.125 2024-08-20 18:47:12,872 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.265e+01 2.540e+01 2.946e+01 8.302e+02, threshold=5.079e+01, percent-clipped=3.0 2024-08-20 18:47:15,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4911490.0, ans=0.125 2024-08-20 18:47:34,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4911590.0, ans=0.125 2024-08-20 18:47:35,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4911590.0, ans=0.0 2024-08-20 18:47:43,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4911590.0, ans=0.1 2024-08-20 18:47:46,322 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 2200, loss[loss=0.09787, beats_loss=0.01155, ecapa_loss=0.0001314, whisper_loss=0.08501, over 23124.00 frames. ], tot_loss[loss=0.09943, beats_loss=0.01063, ecapa_loss=0.0001316, whisper_loss=0.08749, over 3741988.94 frames. ], batch size: 93, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:47:49,618 WARNING [optim.py:496] (3/4) Scaling gradients by 0.06644881516695023, model_norm_threshold=50.791358947753906 2024-08-20 18:47:49,789 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.846e+04, grad_sumsq=7.846e+04, orig_rms_sq=1.000e+00 2024-08-20 18:48:30,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4911890.0, ans=0.125 2024-08-20 18:48:30,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4911890.0, ans=0.1 2024-08-20 18:48:42,241 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 21 from LS+wenet, 25 from Vox, 20 fro AS 2024-08-20 18:49:04,169 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 14 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-20 18:49:17,690 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 2250, loss[loss=0.1293, beats_loss=0.0107, ecapa_loss=0.0001431, whisper_loss=0.1171, over 22057.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01049, ecapa_loss=0.0001317, whisper_loss=0.08868, over 3763067.25 frames. ], batch size: 88, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:49:20,273 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 16 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-20 18:49:45,524 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 35 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 18:49:51,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4912290.0, ans=0.125 2024-08-20 18:50:07,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4912390.0, ans=0.125 2024-08-20 18:50:14,379 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.207e+01 2.453e+01 2.665e+01 7.644e+02, threshold=4.907e+01, percent-clipped=1.0 2024-08-20 18:50:27,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4912490.0, ans=0.2 2024-08-20 18:50:31,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4912590.0, ans=0.2 2024-08-20 18:50:38,214 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 35 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-20 18:50:45,482 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 22 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 18:50:45,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4912590.0, ans=0.125 2024-08-20 18:50:47,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4912690.0, ans=0.125 2024-08-20 18:50:48,188 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 2300, loss[loss=0.08372, beats_loss=0.01148, ecapa_loss=0.000133, whisper_loss=0.07091, over 16465.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01045, ecapa_loss=0.0001332, whisper_loss=0.08971, over 3777284.63 frames. ], batch size: 67, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:51:03,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4912690.0, ans=0.125 2024-08-20 18:51:15,285 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 14 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-20 18:51:19,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4912790.0, ans=0.125 2024-08-20 18:51:21,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4912790.0, ans=0.04949747468305833 2024-08-20 18:51:29,987 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 22 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-20 18:51:34,036 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 21 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-20 18:52:12,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4913090.0, ans=0.0 2024-08-20 18:52:17,059 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 2350, loss[loss=0.09817, beats_loss=0.01112, ecapa_loss=0.000139, whisper_loss=0.08566, over 16649.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0104, ecapa_loss=0.0001353, whisper_loss=0.08993, over 3762132.93 frames. ], batch size: 68, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:52:26,323 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 22 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-20 18:52:30,260 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2024-08-20 18:52:38,712 WARNING [optim.py:496] (3/4) Scaling gradients by 0.04937309771776199, model_norm_threshold=49.067115783691406 2024-08-20 18:52:38,880 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.547e+05, grad_sumsq=4.707e+04, orig_rms_sq=3.286e+00 2024-08-20 18:52:55,242 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-20 18:53:14,672 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.360e+01 2.620e+01 2.900e+01 9.938e+02, threshold=5.241e+01, percent-clipped=3.0 2024-08-20 18:53:19,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4913490.0, ans=0.07 2024-08-20 18:53:26,452 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 16 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 18:53:36,860 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 29 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-20 18:53:49,351 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 2400, loss[loss=0.1135, beats_loss=0.009144, ecapa_loss=0.0001253, whisper_loss=0.1031, over 21740.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01033, ecapa_loss=0.0001368, whisper_loss=0.09045, over 3767122.92 frames. ], batch size: 84, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:53:51,111 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 13 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 18:53:57,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4913690.0, ans=0.125 2024-08-20 18:53:57,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4913690.0, ans=0.0 2024-08-20 18:53:58,734 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 28 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 18:53:59,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4913690.0, ans=0.125 2024-08-20 18:54:02,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4913690.0, ans=0.125 2024-08-20 18:54:09,783 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.82 vs. limit=15.0 2024-08-20 18:54:28,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4913890.0, ans=0.125 2024-08-20 18:54:29,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4913890.0, ans=0.125 2024-08-20 18:54:54,113 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 23 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-20 18:54:56,602 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.73 vs. limit=22.5 2024-08-20 18:55:14,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4914090.0, ans=0.0 2024-08-20 18:55:18,372 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 30 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-20 18:55:19,316 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 2450, loss[loss=0.1128, beats_loss=0.008859, ecapa_loss=0.0001417, whisper_loss=0.1025, over 22739.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01033, ecapa_loss=0.0001373, whisper_loss=0.09061, over 3774625.80 frames. ], batch size: 88, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:55:22,766 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.76 vs. limit=15.0 2024-08-20 18:55:40,108 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 22 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-20 18:55:42,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4914290.0, ans=0.0 2024-08-20 18:55:43,771 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 16 from LS+wenet, 7 from Vox, 27 fro AS 2024-08-20 18:56:07,836 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 20 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-20 18:56:07,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4914390.0, ans=0.0 2024-08-20 18:56:16,542 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.269e+01 2.495e+01 2.810e+01 4.376e+01, threshold=4.990e+01, percent-clipped=0.0 2024-08-20 18:56:25,817 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 26 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 18:56:53,405 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 2500, loss[loss=0.1008, beats_loss=0.009411, ecapa_loss=0.0001631, whisper_loss=0.08972, over 19453.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01032, ecapa_loss=0.0001366, whisper_loss=0.0902, over 3752868.53 frames. ], batch size: 80, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:57:07,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4914690.0, ans=0.2 2024-08-20 18:57:12,823 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.99 vs. limit=22.5 2024-08-20 18:57:18,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4914790.0, ans=0.125 2024-08-20 18:57:19,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4914790.0, ans=0.0 2024-08-20 18:57:46,394 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.98 vs. limit=15.0 2024-08-20 18:58:05,300 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 19 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 18:58:12,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4915090.0, ans=0.125 2024-08-20 18:58:13,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4915090.0, ans=0.0 2024-08-20 18:58:20,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4915190.0, ans=0.0 2024-08-20 18:58:21,728 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 2550, loss[loss=0.1103, beats_loss=0.009943, ecapa_loss=0.0001201, whisper_loss=0.09919, over 20320.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01035, ecapa_loss=0.0001366, whisper_loss=0.08993, over 3755679.50 frames. ], batch size: 73, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:58:31,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4915190.0, ans=0.125 2024-08-20 18:58:45,440 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 26 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-20 18:58:46,044 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.90 vs. limit=15.0 2024-08-20 18:59:08,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4915390.0, ans=0.125 2024-08-20 18:59:10,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4915390.0, ans=0.125 2024-08-20 18:59:12,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4915390.0, ans=0.0 2024-08-20 18:59:14,013 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 36 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 18:59:19,251 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.591e+01 2.358e+01 2.574e+01 2.752e+01 5.119e+01, threshold=5.148e+01, percent-clipped=1.0 2024-08-20 18:59:43,146 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 19 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-20 18:59:54,304 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 23 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-20 18:59:55,385 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 2600, loss[loss=0.1092, beats_loss=0.007755, ecapa_loss=0.0001516, whisper_loss=0.09991, over 17225.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01033, ecapa_loss=0.0001361, whisper_loss=0.09004, over 3751746.73 frames. ], batch size: 64, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:59:55,946 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 33 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-20 18:59:58,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4915690.0, ans=0.125 2024-08-20 19:00:04,112 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 19 from LS+wenet, 33 from Vox, 37 fro AS 2024-08-20 19:00:41,326 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 19:00:43,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4915890.0, ans=0.125 2024-08-20 19:00:48,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4915890.0, ans=0.125 2024-08-20 19:00:52,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4915990.0, ans=0.0 2024-08-20 19:01:05,973 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.94 vs. limit=15.0 2024-08-20 19:01:17,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4916090.0, ans=0.125 2024-08-20 19:01:22,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4916090.0, ans=0.1 2024-08-20 19:01:30,875 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 2650, loss[loss=0.113, beats_loss=0.009527, ecapa_loss=0.0001751, whisper_loss=0.1017, over 21127.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0103, ecapa_loss=0.0001371, whisper_loss=0.0902, over 3783780.77 frames. ], batch size: 88, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:01:34,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=4916190.0, ans=0.02 2024-08-20 19:01:38,294 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 20 from LS+wenet, 17 from Vox, 15 fro AS 2024-08-20 19:02:02,525 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0383872389793396, model_norm_threshold=51.48301696777344 2024-08-20 19:02:02,693 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.236e+05, grad_sumsq=2.236e+05, orig_rms_sq=1.000e+00 2024-08-20 19:02:25,490 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.307e+01 2.525e+01 3.012e+01 1.341e+03, threshold=5.051e+01, percent-clipped=2.0 2024-08-20 19:02:28,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4916490.0, ans=0.125 2024-08-20 19:02:33,938 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=22.5 2024-08-20 19:02:38,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4916490.0, ans=0.125 2024-08-20 19:02:40,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4916590.0, ans=0.125 2024-08-20 19:02:41,474 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-20 19:02:45,530 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 9 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 19:02:50,900 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 28 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-20 19:02:59,060 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 2700, loss[loss=0.1044, beats_loss=0.008667, ecapa_loss=0.0001432, whisper_loss=0.09429, over 16574.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01034, ecapa_loss=0.0001382, whisper_loss=0.08974, over 3771199.04 frames. ], batch size: 64, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:03:14,217 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 22 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 19:03:15,775 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 16 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-20 19:03:33,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4916890.0, ans=0.0 2024-08-20 19:03:36,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4916890.0, ans=0.125 2024-08-20 19:04:07,298 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 33 from LS+wenet, 13 from Vox, 48 fro AS 2024-08-20 19:04:24,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4917090.0, ans=0.125 2024-08-20 19:04:27,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4917190.0, ans=0.125 2024-08-20 19:04:27,976 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 2750, loss[loss=0.1072, beats_loss=0.01031, ecapa_loss=0.0001376, whisper_loss=0.09555, over 19879.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01037, ecapa_loss=0.0001385, whisper_loss=0.08949, over 3769498.62 frames. ], batch size: 80, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:04:31,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4917190.0, ans=0.125 2024-08-20 19:04:32,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4917190.0, ans=0.2 2024-08-20 19:04:37,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4917190.0, ans=0.1 2024-08-20 19:05:02,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4917290.0, ans=0.125 2024-08-20 19:05:17,668 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 28 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-20 19:05:17,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4917390.0, ans=0.125 2024-08-20 19:05:21,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4917390.0, ans=0.1 2024-08-20 19:05:28,450 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.332e+01 2.555e+01 2.897e+01 4.432e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-20 19:05:52,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4917590.0, ans=0.05 2024-08-20 19:06:04,503 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 2800, loss[loss=0.06011, beats_loss=0.01316, ecapa_loss=0.0001249, whisper_loss=0.04571, over 15685.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01034, ecapa_loss=0.0001382, whisper_loss=0.09007, over 3799715.50 frames. ], batch size: 67, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:06:05,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4917690.0, ans=0.125 2024-08-20 19:06:13,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4917690.0, ans=0.0 2024-08-20 19:06:23,946 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 37 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-20 19:06:32,139 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 18 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-20 19:06:35,462 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-20 19:06:40,539 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 20 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 19:06:41,122 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.64 vs. limit=10.0 2024-08-20 19:06:53,549 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 23 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-20 19:07:04,110 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 22 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 19:07:17,805 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 22 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-20 19:07:32,815 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 2850, loss[loss=0.1088, beats_loss=0.01145, ecapa_loss=0.0001654, whisper_loss=0.09566, over 15976.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01041, ecapa_loss=0.0001372, whisper_loss=0.08985, over 3794905.33 frames. ], batch size: 66, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:07:40,786 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 16 from LS+wenet, 34 from Vox, 39 fro AS 2024-08-20 19:07:48,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=4918190.0, ans=22.5 2024-08-20 19:07:55,595 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 17 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 19:08:15,327 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 24 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 19:08:19,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4918390.0, ans=0.05 2024-08-20 19:08:29,384 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.343e+01 2.572e+01 2.859e+01 3.545e+01, threshold=5.143e+01, percent-clipped=0.0 2024-08-20 19:08:36,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4918490.0, ans=0.125 2024-08-20 19:08:48,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4918590.0, ans=0.05 2024-08-20 19:08:50,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4918590.0, ans=10.0 2024-08-20 19:09:02,431 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 20 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-20 19:09:03,496 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 2900, loss[loss=0.1048, beats_loss=0.009923, ecapa_loss=0.0001229, whisper_loss=0.09361, over 15056.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01039, ecapa_loss=0.0001373, whisper_loss=0.09053, over 3780958.58 frames. ], batch size: 57, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:09:06,938 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.07 vs. limit=10.0 2024-08-20 19:09:11,436 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 22 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-20 19:09:21,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4918790.0, ans=0.125 2024-08-20 19:09:24,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4918790.0, ans=0.125 2024-08-20 19:09:44,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4918890.0, ans=10.0 2024-08-20 19:09:49,991 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.24 vs. limit=6.0 2024-08-20 19:09:57,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4918990.0, ans=0.125 2024-08-20 19:10:01,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4918990.0, ans=0.125 2024-08-20 19:10:15,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4919090.0, ans=0.1 2024-08-20 19:10:17,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4919090.0, ans=0.0 2024-08-20 19:10:24,212 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-20 19:10:24,583 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.20 vs. limit=12.0 2024-08-20 19:10:30,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4919090.0, ans=0.0 2024-08-20 19:10:32,716 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 2950, loss[loss=0.109, beats_loss=0.01028, ecapa_loss=0.0001243, whisper_loss=0.09744, over 23138.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01039, ecapa_loss=0.0001394, whisper_loss=0.08957, over 3796383.10 frames. ], batch size: 91, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:10:39,412 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 21 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 19:11:01,265 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-20 19:11:03,848 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.03 vs. limit=15.0 2024-08-20 19:11:12,912 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 19:11:26,398 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 17 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-20 19:11:30,770 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.232e+01 2.550e+01 2.898e+01 2.799e+02, threshold=5.099e+01, percent-clipped=2.0 2024-08-20 19:11:52,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4919590.0, ans=0.0 2024-08-20 19:12:05,895 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 3000, loss[loss=0.1319, beats_loss=0.00942, ecapa_loss=0.0001212, whisper_loss=0.1212, over 24079.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01043, ecapa_loss=0.0001384, whisper_loss=0.08983, over 3825982.48 frames. ], batch size: 90, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:12:05,896 INFO [train_multi_KD3.py:1140] (3/4) Computing validation loss 2024-08-20 19:12:42,563 INFO [train_multi_KD3.py:1150] (3/4) Epoch 34, validation on ASR_libri: loss=0.2544, beats_loss=0, ecapa_loss=0.000513, whisper_loss=0.2492, over 931116.00 frames. 2024-08-20 19:12:56,315 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.8062, 2.3745, 2.3055, 1.9752], device='cuda:3') 2024-08-20 19:13:06,202 INFO [train_multi_KD3.py:1150] (3/4) Epoch 34, validation on SV_voxceleb1: loss=0.003961, beats_loss=0, ecapa_loss=0.0003961, whisper_loss=0, over 944235.00 frames. 2024-08-20 19:13:55,973 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.4841, 3.0198, 2.3768, 2.2102, 2.2035, 2.0600, 2.4437, 2.5932], device='cuda:3') 2024-08-20 19:14:44,919 INFO [train_multi_KD3.py:1150] (3/4) Epoch 34, validation on AT_audioset: loss=0.02302, beats_loss=0.02302, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 19:14:44,923 INFO [train_multi_KD3.py:1156] (3/4) Maximum memory allocated so far is 32606MB 2024-08-20 19:14:52,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4919690.0, ans=0.0 2024-08-20 19:14:59,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4919690.0, ans=0.125 2024-08-20 19:15:17,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4919790.0, ans=0.2 2024-08-20 19:15:26,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4919890.0, ans=0.125 2024-08-20 19:15:30,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4919890.0, ans=0.0 2024-08-20 19:15:50,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4919990.0, ans=0.125 2024-08-20 19:16:02,238 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.82 vs. limit=15.0 2024-08-20 19:16:03,195 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 18 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 19:16:03,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4920090.0, ans=0.2 2024-08-20 19:16:15,034 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 3050, loss[loss=0.09134, beats_loss=0.01219, ecapa_loss=0.0001176, whisper_loss=0.07798, over 20631.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01038, ecapa_loss=0.0001387, whisper_loss=0.09076, over 3841857.24 frames. ], batch size: 81, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:16:20,868 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-20 19:16:38,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4920290.0, ans=0.2 2024-08-20 19:16:46,060 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=15.0 2024-08-20 19:17:09,407 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.284e+01 2.588e+01 2.897e+01 2.080e+02, threshold=5.176e+01, percent-clipped=1.0 2024-08-20 19:17:10,006 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 19 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-20 19:17:13,602 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.51 vs. limit=22.5 2024-08-20 19:17:21,951 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2024-08-20 19:17:28,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4920590.0, ans=0.1 2024-08-20 19:17:41,208 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 3100, loss[loss=0.1124, beats_loss=0.008032, ecapa_loss=0.0001429, whisper_loss=0.1029, over 16512.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01039, ecapa_loss=0.0001383, whisper_loss=0.09079, over 3845968.83 frames. ], batch size: 60, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:17:43,499 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 20 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 19:17:43,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4920690.0, ans=0.1 2024-08-20 19:18:37,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4920990.0, ans=0.125 2024-08-20 19:18:45,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4920990.0, ans=0.2 2024-08-20 19:18:54,063 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.018e-01 2024-08-20 19:19:11,052 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 3150, loss[loss=0.1034, beats_loss=0.009627, ecapa_loss=0.0001576, whisper_loss=0.09219, over 16499.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01038, ecapa_loss=0.0001397, whisper_loss=0.0905, over 3815275.94 frames. ], batch size: 67, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:19:25,172 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.52 vs. limit=15.0 2024-08-20 19:19:33,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4921290.0, ans=0.1 2024-08-20 19:19:51,628 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.69 vs. limit=15.0 2024-08-20 19:19:54,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4921390.0, ans=0.125 2024-08-20 19:20:01,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4921390.0, ans=0.1 2024-08-20 19:20:05,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4921490.0, ans=0.125 2024-08-20 19:20:06,006 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.207e+01 2.457e+01 2.685e+01 3.583e+01, threshold=4.914e+01, percent-clipped=0.0 2024-08-20 19:20:08,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4921490.0, ans=0.125 2024-08-20 19:20:10,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4921490.0, ans=0.0 2024-08-20 19:20:30,767 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.27 vs. limit=22.5 2024-08-20 19:20:32,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4921590.0, ans=0.5 2024-08-20 19:20:38,072 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 3200, loss[loss=0.1018, beats_loss=0.0139, ecapa_loss=0.0001218, whisper_loss=0.08666, over 21429.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01047, ecapa_loss=0.0001397, whisper_loss=0.0903, over 3831610.69 frames. ], batch size: 89, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:20:53,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4921690.0, ans=0.2 2024-08-20 19:20:55,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4921790.0, ans=0.0 2024-08-20 19:21:08,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4921790.0, ans=0.125 2024-08-20 19:21:22,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4921890.0, ans=0.0 2024-08-20 19:22:03,076 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 3250, loss[loss=0.1036, beats_loss=0.007828, ecapa_loss=0.0001624, whisper_loss=0.09416, over 22849.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01037, ecapa_loss=0.0001407, whisper_loss=0.09128, over 3817589.24 frames. ], batch size: 91, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:22:24,030 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-20 19:22:26,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4922290.0, ans=0.0 2024-08-20 19:22:56,413 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.200e+01 2.511e+01 2.776e+01 3.425e+01, threshold=5.022e+01, percent-clipped=0.0 2024-08-20 19:23:17,309 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 19 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-20 19:23:17,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4922590.0, ans=0.025 2024-08-20 19:23:28,482 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 3300, loss[loss=0.09826, beats_loss=0.00983, ecapa_loss=0.0001853, whisper_loss=0.08658, over 18986.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01034, ecapa_loss=0.0001405, whisper_loss=0.09135, over 3798864.84 frames. ], batch size: 80, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:23:33,476 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 19:23:36,266 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 19:23:41,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4922690.0, ans=0.2 2024-08-20 19:23:47,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4922790.0, ans=0.0 2024-08-20 19:24:03,043 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-20 19:24:04,826 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 22 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-20 19:24:08,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4922890.0, ans=0.0 2024-08-20 19:24:34,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=4922990.0, ans=0.5 2024-08-20 19:24:37,690 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=15.0 2024-08-20 19:24:38,830 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 28 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-20 19:24:39,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4923090.0, ans=0.0 2024-08-20 19:24:53,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4923090.0, ans=0.125 2024-08-20 19:24:55,512 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 3350, loss[loss=0.1137, beats_loss=0.01055, ecapa_loss=0.0001494, whisper_loss=0.1017, over 22668.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01031, ecapa_loss=0.0001405, whisper_loss=0.09167, over 3798805.89 frames. ], batch size: 93, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:24:56,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4923190.0, ans=0.125 2024-08-20 19:25:05,219 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 26 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-20 19:25:13,770 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 21 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 19:25:19,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4923290.0, ans=0.0 2024-08-20 19:25:21,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4923290.0, ans=0.125 2024-08-20 19:25:28,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4923290.0, ans=0.0 2024-08-20 19:25:28,114 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 19:25:42,865 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 29 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-20 19:25:49,367 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.220e+01 2.419e+01 2.738e+01 3.918e+01, threshold=4.837e+01, percent-clipped=0.0 2024-08-20 19:25:59,896 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 24 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-20 19:26:11,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4923590.0, ans=0.1 2024-08-20 19:26:12,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4923590.0, ans=0.1 2024-08-20 19:26:16,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4923590.0, ans=0.07 2024-08-20 19:26:21,895 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 3400, loss[loss=0.1235, beats_loss=0.008558, ecapa_loss=0.0001387, whisper_loss=0.1135, over 22839.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01024, ecapa_loss=0.0001408, whisper_loss=0.09154, over 3785674.22 frames. ], batch size: 91, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:26:37,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4923690.0, ans=0.125 2024-08-20 19:26:41,829 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 19 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-20 19:26:50,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4923790.0, ans=0.2 2024-08-20 19:26:52,419 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 33 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-20 19:27:40,814 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 14 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-20 19:27:48,626 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 3450, loss[loss=0.085, beats_loss=0.01127, ecapa_loss=0.0001528, whisper_loss=0.0722, over 21989.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0103, ecapa_loss=0.0001401, whisper_loss=0.09052, over 3798269.79 frames. ], batch size: 93, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:27:54,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4924190.0, ans=0.0 2024-08-20 19:28:05,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4924290.0, ans=0.0 2024-08-20 19:28:07,147 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 21 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-20 19:28:26,194 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 29 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-20 19:28:29,778 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 19:28:29,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4924390.0, ans=0.125 2024-08-20 19:28:33,518 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.92 vs. limit=15.0 2024-08-20 19:28:34,706 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 19 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 19:28:41,633 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 27 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-20 19:28:42,623 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.395e+01 2.734e+01 3.067e+01 2.505e+02, threshold=5.467e+01, percent-clipped=4.0 2024-08-20 19:28:50,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4924490.0, ans=0.125 2024-08-20 19:29:15,168 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 3500, loss[loss=0.09588, beats_loss=0.01246, ecapa_loss=0.0001292, whisper_loss=0.08213, over 18243.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01034, ecapa_loss=0.0001397, whisper_loss=0.09048, over 3810224.46 frames. ], batch size: 72, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:29:24,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4924690.0, ans=0.0 2024-08-20 19:29:31,219 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 26 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-20 19:29:36,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4924790.0, ans=0.1 2024-08-20 19:29:56,380 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.34 vs. limit=12.0 2024-08-20 19:29:58,276 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2024-08-20 19:29:59,472 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 23 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-20 19:30:08,344 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 19:30:42,355 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 3550, loss[loss=0.1077, beats_loss=0.009246, ecapa_loss=0.0001442, whisper_loss=0.09706, over 19252.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01038, ecapa_loss=0.0001394, whisper_loss=0.09015, over 3838584.94 frames. ], batch size: 75, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:30:45,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4925190.0, ans=0.0 2024-08-20 19:30:53,986 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=9.745e-01 2024-08-20 19:30:59,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4925290.0, ans=0.0 2024-08-20 19:31:00,892 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 22 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-20 19:31:15,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4925290.0, ans=0.1 2024-08-20 19:31:29,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4925390.0, ans=0.1 2024-08-20 19:31:36,715 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.776e+01 2.289e+01 2.465e+01 2.729e+01 3.504e+01, threshold=4.930e+01, percent-clipped=0.0 2024-08-20 19:31:42,919 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2024-08-20 19:31:49,050 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2024-08-20 19:32:03,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4925590.0, ans=0.125 2024-08-20 19:32:09,080 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 3600, loss[loss=0.07116, beats_loss=0.01134, ecapa_loss=0.0001421, whisper_loss=0.0584, over 16114.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01034, ecapa_loss=0.0001401, whisper_loss=0.09034, over 3848699.57 frames. ], batch size: 66, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:32:16,834 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 26 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 19:32:45,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4925890.0, ans=0.09899494936611666 2024-08-20 19:32:49,289 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2024-08-20 19:32:50,561 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 26 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-20 19:33:06,136 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 21 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-20 19:33:14,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4925990.0, ans=0.125 2024-08-20 19:33:18,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4926090.0, ans=0.0 2024-08-20 19:33:30,252 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 15 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-20 19:33:34,577 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 3650, loss[loss=0.1074, beats_loss=0.01148, ecapa_loss=0.0001404, whisper_loss=0.09448, over 23245.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01035, ecapa_loss=0.0001403, whisper_loss=0.0901, over 3852109.37 frames. ], batch size: 89, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:33:51,820 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.80 vs. limit=10.0 2024-08-20 19:34:04,594 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.25 vs. limit=15.0 2024-08-20 19:34:06,647 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2024-08-20 19:34:12,347 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 13 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-20 19:34:28,499 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.676e+01 2.198e+01 2.421e+01 2.738e+01 4.465e+02, threshold=4.843e+01, percent-clipped=1.0 2024-08-20 19:34:29,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=4926490.0, ans=0.025 2024-08-20 19:34:48,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4926590.0, ans=0.0 2024-08-20 19:34:48,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4926590.0, ans=0.0 2024-08-20 19:34:54,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4926590.0, ans=0.1 2024-08-20 19:34:54,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4926590.0, ans=0.125 2024-08-20 19:35:01,491 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 3700, loss[loss=0.08131, beats_loss=0.01367, ecapa_loss=0.0001029, whisper_loss=0.06661, over 20858.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01043, ecapa_loss=0.00014, whisper_loss=0.08895, over 3806494.41 frames. ], batch size: 82, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:35:04,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4926690.0, ans=0.07 2024-08-20 19:35:22,780 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2024-08-20 19:35:28,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4926790.0, ans=0.0 2024-08-20 19:35:36,749 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 21 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 19:35:42,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4926890.0, ans=0.125 2024-08-20 19:35:47,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4926890.0, ans=0.125 2024-08-20 19:35:49,126 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.93 vs. limit=12.0 2024-08-20 19:36:10,388 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 21 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-20 19:36:29,328 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 3750, loss[loss=0.08532, beats_loss=0.007837, ecapa_loss=0.0001088, whisper_loss=0.07639, over 14567.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01039, ecapa_loss=0.0001402, whisper_loss=0.08899, over 3792012.23 frames. ], batch size: 50, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:36:31,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4927190.0, ans=0.0 2024-08-20 19:36:33,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4927190.0, ans=0.0 2024-08-20 19:36:37,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4927190.0, ans=0.2 2024-08-20 19:36:39,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4927190.0, ans=0.125 2024-08-20 19:36:44,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4927190.0, ans=0.125 2024-08-20 19:36:51,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4927290.0, ans=0.125 2024-08-20 19:36:53,132 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 28 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-20 19:37:01,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4927290.0, ans=0.125 2024-08-20 19:37:15,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4927390.0, ans=0.125 2024-08-20 19:37:22,989 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.233e+01 2.505e+01 2.774e+01 5.527e+01, threshold=5.010e+01, percent-clipped=2.0 2024-08-20 19:37:26,589 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 26 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 19:37:28,466 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 22 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-20 19:37:30,031 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 19:37:47,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4927590.0, ans=0.2 2024-08-20 19:37:55,090 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 3800, loss[loss=0.1069, beats_loss=0.01089, ecapa_loss=0.0001243, whisper_loss=0.09474, over 21027.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01034, ecapa_loss=0.000141, whisper_loss=0.08978, over 3786522.82 frames. ], batch size: 80, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:37:57,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4927690.0, ans=0.125 2024-08-20 19:38:38,126 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.38 vs. limit=10.0 2024-08-20 19:38:56,687 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-20 19:38:57,158 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.29 vs. limit=12.0 2024-08-20 19:38:58,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4927990.0, ans=0.125 2024-08-20 19:39:12,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4928090.0, ans=0.125 2024-08-20 19:39:21,975 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 3850, loss[loss=0.1116, beats_loss=0.01038, ecapa_loss=0.0001333, whisper_loss=0.09992, over 23245.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01042, ecapa_loss=0.0001408, whisper_loss=0.08935, over 3766415.30 frames. ], batch size: 93, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:39:22,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4928190.0, ans=0.0 2024-08-20 19:39:24,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4928190.0, ans=0.125 2024-08-20 19:39:41,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4928290.0, ans=0.0 2024-08-20 19:39:48,634 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 16 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 19:40:07,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4928390.0, ans=0.125 2024-08-20 19:40:16,897 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.695e+01 2.370e+01 2.629e+01 2.963e+01 4.700e+01, threshold=5.257e+01, percent-clipped=0.0 2024-08-20 19:40:40,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4928590.0, ans=0.125 2024-08-20 19:40:50,866 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 3900, loss[loss=0.09026, beats_loss=0.009792, ecapa_loss=0.0001314, whisper_loss=0.07916, over 13925.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01047, ecapa_loss=0.0001404, whisper_loss=0.08937, over 3793736.82 frames. ], batch size: 54, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:41:44,698 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 34 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 19:42:16,735 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 3950, loss[loss=0.1159, beats_loss=0.01019, ecapa_loss=0.0001201, whisper_loss=0.1045, over 24629.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01041, ecapa_loss=0.0001408, whisper_loss=0.08992, over 3829641.58 frames. ], batch size: 94, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:43:03,371 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 19:43:04,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4929390.0, ans=0.035 2024-08-20 19:43:11,942 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.377e+01 2.625e+01 2.908e+01 3.824e+01, threshold=5.250e+01, percent-clipped=0.0 2024-08-20 19:43:14,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4929490.0, ans=0.125 2024-08-20 19:43:19,104 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 12 from LS+wenet, 9 from Vox, 30 fro AS 2024-08-20 19:43:22,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4929490.0, ans=0.125 2024-08-20 19:43:29,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4929590.0, ans=0.125 2024-08-20 19:43:33,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4929590.0, ans=0.0 2024-08-20 19:43:36,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4929590.0, ans=0.125 2024-08-20 19:43:44,292 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 4000, loss[loss=0.1006, beats_loss=0.01069, ecapa_loss=0.0001248, whisper_loss=0.08861, over 22999.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01042, ecapa_loss=0.0001413, whisper_loss=0.08961, over 3799106.47 frames. ], batch size: 90, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:43:46,376 INFO [train_multi_KD3.py:845] (3/4) A total of 49 cuts. 16 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-20 19:43:56,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4929690.0, ans=0.1 2024-08-20 19:43:59,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4929690.0, ans=0.1 2024-08-20 19:44:06,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4929790.0, ans=0.09899494936611666 2024-08-20 19:44:28,055 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-20 19:44:52,426 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.25 vs. limit=22.5 2024-08-20 19:44:57,050 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 29 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-20 19:45:00,952 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.48 vs. limit=22.5 2024-08-20 19:45:03,187 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-20 19:45:09,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4930090.0, ans=0.1 2024-08-20 19:45:09,883 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.73 vs. limit=12.0 2024-08-20 19:45:14,050 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 4050, loss[loss=0.1009, beats_loss=0.008681, ecapa_loss=0.0001855, whisper_loss=0.09038, over 16181.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0104, ecapa_loss=0.000142, whisper_loss=0.08983, over 3836825.17 frames. ], batch size: 68, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:45:14,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4930190.0, ans=0.0 2024-08-20 19:45:20,202 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 13 from Vox, 46 fro AS 2024-08-20 19:45:20,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4930190.0, ans=0.0 2024-08-20 19:45:35,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4930290.0, ans=0.035 2024-08-20 19:45:43,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4930290.0, ans=0.0 2024-08-20 19:45:43,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4930290.0, ans=0.0 2024-08-20 19:46:03,936 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.18 vs. limit=15.0 2024-08-20 19:46:11,279 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.272e+01 2.496e+01 2.748e+01 3.675e+01, threshold=4.991e+01, percent-clipped=0.0 2024-08-20 19:46:15,667 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 24 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 19:46:21,930 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.69 vs. limit=15.0 2024-08-20 19:46:23,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4930490.0, ans=0.0 2024-08-20 19:46:23,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4930490.0, ans=0.1 2024-08-20 19:46:41,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4930590.0, ans=0.0 2024-08-20 19:46:41,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4930590.0, ans=0.07 2024-08-20 19:46:44,572 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 4100, loss[loss=0.1024, beats_loss=0.009548, ecapa_loss=0.0001443, whisper_loss=0.09139, over 18099.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01041, ecapa_loss=0.0001418, whisper_loss=0.09037, over 3857746.58 frames. ], batch size: 71, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:47:35,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4930890.0, ans=0.09899494936611666 2024-08-20 19:47:55,001 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.45 vs. limit=22.5 2024-08-20 19:48:12,588 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 4150, loss[loss=0.1066, beats_loss=0.01004, ecapa_loss=0.0001234, whisper_loss=0.09529, over 22890.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01036, ecapa_loss=0.00014, whisper_loss=0.09111, over 3835775.40 frames. ], batch size: 89, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:48:26,813 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 12 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 19:48:39,322 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 28 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-20 19:48:42,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=4931290.0, ans=0.05 2024-08-20 19:48:45,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4931290.0, ans=0.125 2024-08-20 19:48:52,203 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-20 19:48:54,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4931390.0, ans=0.125 2024-08-20 19:48:54,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4931390.0, ans=0.2 2024-08-20 19:49:09,967 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.357e+01 2.563e+01 2.804e+01 4.051e+01, threshold=5.127e+01, percent-clipped=0.0 2024-08-20 19:49:31,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4931590.0, ans=0.125 2024-08-20 19:49:42,271 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 4200, loss[loss=0.1213, beats_loss=0.008465, ecapa_loss=0.0001503, whisper_loss=0.1113, over 14088.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01031, ecapa_loss=0.0001405, whisper_loss=0.09183, over 3819406.29 frames. ], batch size: 54, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:50:23,532 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.27 vs. limit=15.0 2024-08-20 19:50:28,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4931890.0, ans=0.0 2024-08-20 19:50:32,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4931890.0, ans=0.1 2024-08-20 19:50:36,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4931990.0, ans=0.1 2024-08-20 19:50:53,713 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 25 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-20 19:51:05,312 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 12 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-20 19:51:05,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4932090.0, ans=0.07 2024-08-20 19:51:11,294 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 4250, loss[loss=0.09647, beats_loss=0.01033, ecapa_loss=0.0001891, whisper_loss=0.08425, over 20737.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01038, ecapa_loss=0.0001397, whisper_loss=0.09143, over 3815081.25 frames. ], batch size: 90, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:51:13,998 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 26 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-20 19:51:19,809 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 19:51:39,256 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.83 vs. limit=15.0 2024-08-20 19:51:44,712 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 32 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 19:52:08,804 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.607e+01 2.265e+01 2.580e+01 2.966e+01 3.429e+02, threshold=5.160e+01, percent-clipped=3.0 2024-08-20 19:52:21,241 WARNING [optim.py:496] (3/4) Scaling gradients by 0.022736379876732826, model_norm_threshold=51.5983772277832 2024-08-20 19:52:21,409 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.129e+05, grad_sumsq=7.129e+05, orig_rms_sq=1.000e+00 2024-08-20 19:52:32,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4932590.0, ans=0.1 2024-08-20 19:52:35,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4932590.0, ans=0.125 2024-08-20 19:52:39,738 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 4300, loss[loss=0.09895, beats_loss=0.00975, ecapa_loss=0.0001342, whisper_loss=0.08786, over 19363.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01042, ecapa_loss=0.0001387, whisper_loss=0.09137, over 3840186.78 frames. ], batch size: 78, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:52:56,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4932790.0, ans=0.125 2024-08-20 19:53:00,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4932790.0, ans=0.125 2024-08-20 19:53:20,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4932890.0, ans=0.125 2024-08-20 19:53:25,608 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 16 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-20 19:53:33,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4932990.0, ans=0.125 2024-08-20 19:53:37,488 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.07 vs. limit=15.0 2024-08-20 19:53:40,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4932990.0, ans=0.125 2024-08-20 19:53:46,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4932990.0, ans=0.125 2024-08-20 19:54:05,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4933090.0, ans=0.1 2024-08-20 19:54:08,032 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 4350, loss[loss=0.1156, beats_loss=0.004641, ecapa_loss=0.0002351, whisper_loss=0.1086, over 14101.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01044, ecapa_loss=0.0001388, whisper_loss=0.09056, over 3847456.07 frames. ], batch size: 59, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:54:08,870 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 32 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-20 19:54:09,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4933190.0, ans=0.125 2024-08-20 19:54:09,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4933190.0, ans=0.125 2024-08-20 19:54:24,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4933290.0, ans=0.125 2024-08-20 19:54:27,201 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2024-08-20 19:54:30,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4933290.0, ans=0.1 2024-08-20 19:54:49,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4933390.0, ans=0.125 2024-08-20 19:55:03,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4933490.0, ans=0.1 2024-08-20 19:55:04,293 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.347e+01 2.590e+01 2.980e+01 2.269e+03, threshold=5.180e+01, percent-clipped=1.0 2024-08-20 19:55:12,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4933490.0, ans=0.125 2024-08-20 19:55:27,005 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 19:55:29,505 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2024-08-20 19:55:30,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4933590.0, ans=0.0 2024-08-20 19:55:32,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4933590.0, ans=0.1 2024-08-20 19:55:35,532 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 4400, loss[loss=0.1173, beats_loss=0.007268, ecapa_loss=0.0001849, whisper_loss=0.1081, over 20008.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.0001393, whisper_loss=0.09033, over 3854418.36 frames. ], batch size: 83, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:55:48,151 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.38 vs. limit=15.0 2024-08-20 19:56:23,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4933890.0, ans=0.0 2024-08-20 19:56:25,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4933890.0, ans=0.125 2024-08-20 19:56:36,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4933990.0, ans=0.0 2024-08-20 19:56:43,902 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 21 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-20 19:56:47,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4934090.0, ans=0.125 2024-08-20 19:57:01,136 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 20 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-20 19:57:05,946 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 4450, loss[loss=0.09522, beats_loss=0.01169, ecapa_loss=0.0001158, whisper_loss=0.08238, over 15280.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01037, ecapa_loss=0.0001401, whisper_loss=0.08998, over 3777267.09 frames. ], batch size: 59, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:57:06,330 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 30 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-20 19:57:17,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4934190.0, ans=0.125 2024-08-20 19:57:35,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4934290.0, ans=0.05 2024-08-20 19:57:55,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4934390.0, ans=0.125 2024-08-20 19:58:01,030 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.622e+01 2.339e+01 2.669e+01 2.965e+01 4.502e+01, threshold=5.338e+01, percent-clipped=0.0 2024-08-20 19:58:06,971 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 38 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-20 19:58:15,186 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.387e-01 2024-08-20 19:58:30,652 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 4500, loss[loss=0.09241, beats_loss=0.01061, ecapa_loss=0.0001473, whisper_loss=0.08032, over 20216.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0103, ecapa_loss=0.0001408, whisper_loss=0.0903, over 3791864.49 frames. ], batch size: 82, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:58:35,028 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.03 vs. limit=15.0 2024-08-20 19:59:05,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=4934890.0, ans=10.0 2024-08-20 19:59:07,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4934890.0, ans=0.0 2024-08-20 19:59:12,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4934890.0, ans=0.0 2024-08-20 19:59:13,864 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-20 19:59:16,165 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=15.0 2024-08-20 19:59:28,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4934990.0, ans=0.125 2024-08-20 19:59:48,342 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-20 19:59:54,422 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 4550, loss[loss=0.1207, beats_loss=0.00915, ecapa_loss=0.0001001, whisper_loss=0.1106, over 16337.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01027, ecapa_loss=0.0001403, whisper_loss=0.09012, over 3779695.52 frames. ], batch size: 59, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:59:56,817 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 23 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-20 20:00:24,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4935290.0, ans=0.025 2024-08-20 20:00:26,073 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.76 vs. limit=15.0 2024-08-20 20:00:30,683 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 21 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-20 20:00:47,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4935490.0, ans=0.2 2024-08-20 20:00:50,289 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.294e+01 2.455e+01 2.830e+01 3.953e+01, threshold=4.911e+01, percent-clipped=0.0 2024-08-20 20:01:14,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4935590.0, ans=0.125 2024-08-20 20:01:22,207 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 4600, loss[loss=0.1169, beats_loss=0.01011, ecapa_loss=0.000136, whisper_loss=0.1054, over 21636.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01029, ecapa_loss=0.0001408, whisper_loss=0.08966, over 3779677.40 frames. ], batch size: 82, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:01:33,195 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 24 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-20 20:01:37,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4935690.0, ans=0.125 2024-08-20 20:01:40,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4935790.0, ans=0.2 2024-08-20 20:01:49,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4935790.0, ans=0.125 2024-08-20 20:01:59,189 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 16 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-20 20:01:59,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4935890.0, ans=0.125 2024-08-20 20:02:07,722 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 22 from LS+wenet, 17 from Vox, 13 fro AS 2024-08-20 20:02:18,288 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 25 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-20 20:02:22,605 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.77 vs. limit=6.0 2024-08-20 20:02:36,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4936090.0, ans=0.1 2024-08-20 20:02:36,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4936090.0, ans=0.125 2024-08-20 20:02:39,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4936090.0, ans=0.125 2024-08-20 20:02:43,015 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-08-20 20:02:48,901 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 4650, loss[loss=0.09172, beats_loss=0.01417, ecapa_loss=0.0001106, whisper_loss=0.07645, over 17862.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01026, ecapa_loss=0.0001411, whisper_loss=0.09013, over 3799144.97 frames. ], batch size: 71, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:02:51,648 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 23 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-20 20:02:55,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4936190.0, ans=0.125 2024-08-20 20:02:56,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4936190.0, ans=0.2 2024-08-20 20:03:03,559 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.25 vs. limit=15.0 2024-08-20 20:03:13,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4936290.0, ans=0.125 2024-08-20 20:03:13,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4936290.0, ans=0.125 2024-08-20 20:03:20,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4936290.0, ans=0.1 2024-08-20 20:03:22,009 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-20 20:03:22,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4936390.0, ans=0.09899494936611666 2024-08-20 20:03:40,219 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.73 vs. limit=15.0 2024-08-20 20:03:43,409 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2024-08-20 20:03:43,918 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.237e+01 2.528e+01 2.827e+01 5.668e+01, threshold=5.055e+01, percent-clipped=2.0 2024-08-20 20:03:44,409 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 22 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-20 20:03:44,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4936490.0, ans=0.0 2024-08-20 20:03:47,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4936490.0, ans=0.125 2024-08-20 20:04:15,054 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 4700, loss[loss=0.1253, beats_loss=0.006923, ecapa_loss=0.0001781, whisper_loss=0.1166, over 17291.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01029, ecapa_loss=0.0001405, whisper_loss=0.09027, over 3812597.34 frames. ], batch size: 66, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:04:17,541 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 38 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-20 20:04:31,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4936790.0, ans=0.125 2024-08-20 20:04:40,812 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 29 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-20 20:04:42,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4936790.0, ans=0.125 2024-08-20 20:04:56,932 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 35 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-20 20:04:58,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4936890.0, ans=0.125 2024-08-20 20:05:01,922 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.25 vs. limit=15.0 2024-08-20 20:05:30,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4937090.0, ans=0.09899494936611666 2024-08-20 20:05:39,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4937190.0, ans=0.125 2024-08-20 20:05:39,983 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 4750, loss[loss=0.09534, beats_loss=0.009493, ecapa_loss=0.0001845, whisper_loss=0.084, over 16244.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01029, ecapa_loss=0.0001406, whisper_loss=0.09038, over 3809595.99 frames. ], batch size: 68, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:05:51,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4937190.0, ans=0.1 2024-08-20 20:06:05,558 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 22 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 20:06:19,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4937390.0, ans=0.1 2024-08-20 20:06:22,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4937390.0, ans=0.0 2024-08-20 20:06:37,058 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.283e+01 2.561e+01 2.830e+01 4.199e+01, threshold=5.121e+01, percent-clipped=0.0 2024-08-20 20:06:42,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4937490.0, ans=0.125 2024-08-20 20:06:44,888 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 26 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-20 20:06:50,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4937590.0, ans=0.125 2024-08-20 20:06:55,587 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 12 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-20 20:06:56,300 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.66 vs. limit=15.0 2024-08-20 20:06:59,091 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 34 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 20:07:09,508 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 4800, loss[loss=0.09259, beats_loss=0.01068, ecapa_loss=0.0001376, whisper_loss=0.08054, over 20716.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01026, ecapa_loss=0.0001401, whisper_loss=0.09081, over 3813248.72 frames. ], batch size: 87, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:07:13,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4937690.0, ans=0.0 2024-08-20 20:07:22,568 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.60 vs. limit=12.0 2024-08-20 20:07:56,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4937890.0, ans=0.09899494936611666 2024-08-20 20:08:02,679 WARNING [optim.py:496] (3/4) Scaling gradients by 0.025113865733146667, model_norm_threshold=51.21064758300781 2024-08-20 20:08:02,847 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.29, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.205e+06, grad_sumsq=1.205e+06, orig_rms_sq=1.000e+00 2024-08-20 20:08:06,971 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 24 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-20 20:08:35,543 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 20:08:37,642 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.89 vs. limit=15.0 2024-08-20 20:08:37,931 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 4850, loss[loss=0.08667, beats_loss=0.0116, ecapa_loss=0.00014, whisper_loss=0.07367, over 21791.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01016, ecapa_loss=0.000141, whisper_loss=0.0911, over 3789874.28 frames. ], batch size: 91, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:08:43,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4938190.0, ans=0.125 2024-08-20 20:08:44,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4938190.0, ans=0.2 2024-08-20 20:08:54,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.51 vs. limit=15.0 2024-08-20 20:08:55,013 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 35 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 20:09:03,970 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 22 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-20 20:09:06,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4938290.0, ans=0.0 2024-08-20 20:09:28,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4938390.0, ans=0.125 2024-08-20 20:09:34,540 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.379e+01 2.535e+01 2.834e+01 2.039e+03, threshold=5.069e+01, percent-clipped=1.0 2024-08-20 20:09:37,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4938490.0, ans=0.95 2024-08-20 20:09:54,367 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.284e+05 2024-08-20 20:10:05,585 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 4900, loss[loss=0.08386, beats_loss=0.009395, ecapa_loss=0.000207, whisper_loss=0.0724, over 11840.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01021, ecapa_loss=0.0001406, whisper_loss=0.09106, over 3792918.65 frames. ], batch size: 51, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:10:13,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4938690.0, ans=0.0 2024-08-20 20:10:13,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4938690.0, ans=0.125 2024-08-20 20:10:39,363 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 19 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-20 20:10:41,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4938890.0, ans=0.0 2024-08-20 20:10:45,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4938890.0, ans=0.125 2024-08-20 20:11:04,215 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 23 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-20 20:11:05,211 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2024-08-20 20:11:19,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4939090.0, ans=0.0 2024-08-20 20:11:34,777 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 4950, loss[loss=0.1, beats_loss=0.009433, ecapa_loss=0.0001265, whisper_loss=0.08933, over 22026.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01024, ecapa_loss=0.0001413, whisper_loss=0.09104, over 3821276.03 frames. ], batch size: 88, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:11:37,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4939190.0, ans=0.035 2024-08-20 20:11:46,377 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 19 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-20 20:12:10,411 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 22 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 20:12:17,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4939390.0, ans=0.0 2024-08-20 20:12:23,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4939390.0, ans=0.0 2024-08-20 20:12:32,890 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.356e+01 2.576e+01 2.948e+01 1.126e+02, threshold=5.153e+01, percent-clipped=1.0 2024-08-20 20:12:47,733 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 23 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-20 20:13:05,122 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 5000, loss[loss=0.1054, beats_loss=0.01148, ecapa_loss=0.0001372, whisper_loss=0.09259, over 23233.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01024, ecapa_loss=0.0001417, whisper_loss=0.09088, over 3846687.73 frames. ], batch size: 94, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:13:13,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4939690.0, ans=0.125 2024-08-20 20:13:20,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4939690.0, ans=0.0 2024-08-20 20:13:52,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4939890.0, ans=0.07 2024-08-20 20:13:54,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4939890.0, ans=0.1 2024-08-20 20:14:01,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4939990.0, ans=0.0 2024-08-20 20:14:19,688 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 17 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-20 20:14:36,054 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 5050, loss[loss=0.09463, beats_loss=0.007302, ecapa_loss=0.000195, whisper_loss=0.08538, over 19608.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01028, ecapa_loss=0.0001424, whisper_loss=0.09013, over 3815448.05 frames. ], batch size: 87, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:14:38,729 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.31 vs. limit=22.5 2024-08-20 20:14:45,126 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 14 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 20:14:45,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4940190.0, ans=0.125 2024-08-20 20:14:56,580 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2024-08-20 20:15:02,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4940290.0, ans=0.125 2024-08-20 20:15:23,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4940390.0, ans=0.125 2024-08-20 20:15:33,060 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.260e+01 2.507e+01 2.805e+01 5.478e+01, threshold=5.013e+01, percent-clipped=1.0 2024-08-20 20:15:46,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4940590.0, ans=0.0 2024-08-20 20:15:57,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=4940590.0, ans=6.0 2024-08-20 20:16:04,929 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 5100, loss[loss=0.09907, beats_loss=0.01147, ecapa_loss=0.0001252, whisper_loss=0.08634, over 22005.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01027, ecapa_loss=0.0001425, whisper_loss=0.091, over 3811614.24 frames. ], batch size: 87, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:16:10,957 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 20:16:22,835 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-20 20:16:30,488 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 22 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-20 20:16:44,291 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2024-08-20 20:16:50,714 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 12 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-20 20:16:54,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4940890.0, ans=0.015 2024-08-20 20:17:12,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=4940990.0, ans=22.5 2024-08-20 20:17:15,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4941090.0, ans=0.2 2024-08-20 20:17:20,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4941090.0, ans=0.125 2024-08-20 20:17:20,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4941090.0, ans=0.125 2024-08-20 20:17:26,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4941090.0, ans=0.125 2024-08-20 20:17:32,262 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 5150, loss[loss=0.1188, beats_loss=0.008184, ecapa_loss=0.0001711, whisper_loss=0.1089, over 17935.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01041, ecapa_loss=0.000141, whisper_loss=0.09054, over 3828803.11 frames. ], batch size: 73, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:17:52,415 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 22 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 20:17:59,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4941290.0, ans=0.0 2024-08-20 20:17:59,694 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.41 vs. limit=15.0 2024-08-20 20:18:25,269 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.96 vs. limit=22.5 2024-08-20 20:18:27,207 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.219e+01 2.541e+01 2.868e+01 3.859e+01, threshold=5.083e+01, percent-clipped=0.0 2024-08-20 20:18:32,967 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 20:18:35,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.07 vs. limit=15.0 2024-08-20 20:18:40,250 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-20 20:18:47,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4941590.0, ans=0.125 2024-08-20 20:18:57,903 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 5200, loss[loss=0.08024, beats_loss=0.01395, ecapa_loss=0.0001362, whisper_loss=0.06493, over 22110.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.0001402, whisper_loss=0.09052, over 3842290.72 frames. ], batch size: 97, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:19:39,773 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 19 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-20 20:19:43,228 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.18 vs. limit=10.0 2024-08-20 20:20:02,305 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 23 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 20:20:07,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4942090.0, ans=0.5 2024-08-20 20:20:07,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4942090.0, ans=0.125 2024-08-20 20:20:10,726 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 18 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 20:20:11,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4942090.0, ans=0.0 2024-08-20 20:20:15,293 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.808e+00 2024-08-20 20:20:22,240 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.53 vs. limit=15.0 2024-08-20 20:20:22,984 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 31 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-20 20:20:25,537 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.71 vs. limit=15.0 2024-08-20 20:20:26,008 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 5250, loss[loss=0.107, beats_loss=0.01063, ecapa_loss=0.0001612, whisper_loss=0.09478, over 21971.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01043, ecapa_loss=0.0001402, whisper_loss=0.09083, over 3846645.68 frames. ], batch size: 90, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:20:37,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4942190.0, ans=0.125 2024-08-20 20:20:44,101 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 20 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-20 20:20:44,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4942290.0, ans=0.5 2024-08-20 20:21:05,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4942390.0, ans=0.125 2024-08-20 20:21:21,449 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 19 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-20 20:21:22,408 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.805e+01 2.241e+01 2.519e+01 2.751e+01 3.972e+01, threshold=5.038e+01, percent-clipped=0.0 2024-08-20 20:21:33,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4942490.0, ans=0.09899494936611666 2024-08-20 20:21:39,545 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.56 vs. limit=22.5 2024-08-20 20:21:52,300 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 26 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 20:21:53,480 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 5300, loss[loss=0.1057, beats_loss=0.01042, ecapa_loss=0.0001397, whisper_loss=0.0939, over 20525.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01039, ecapa_loss=0.0001397, whisper_loss=0.09064, over 3814760.14 frames. ], batch size: 83, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:21:54,304 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 38 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-20 20:22:06,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4942690.0, ans=0.125 2024-08-20 20:22:06,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4942690.0, ans=0.125 2024-08-20 20:22:15,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4942790.0, ans=0.1 2024-08-20 20:22:16,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4942790.0, ans=0.125 2024-08-20 20:22:50,562 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.12 vs. limit=22.5 2024-08-20 20:22:53,417 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.651e-01 2024-08-20 20:23:22,215 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 5350, loss[loss=0.09219, beats_loss=0.008641, ecapa_loss=0.0001755, whisper_loss=0.0818, over 14226.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01031, ecapa_loss=0.0001404, whisper_loss=0.0904, over 3785188.59 frames. ], batch size: 59, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:23:42,713 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 13 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-20 20:23:42,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4943290.0, ans=0.125 2024-08-20 20:23:43,118 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2024-08-20 20:23:47,143 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 25 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 20:23:57,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4943390.0, ans=0.0 2024-08-20 20:24:02,022 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 38 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-20 20:24:05,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=4943390.0, ans=15.0 2024-08-20 20:24:06,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4943390.0, ans=0.125 2024-08-20 20:24:14,904 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-20 20:24:18,070 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 27 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-20 20:24:19,652 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.338e+01 2.503e+01 2.804e+01 4.042e+01, threshold=5.006e+01, percent-clipped=0.0 2024-08-20 20:24:28,091 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2024-08-20 20:24:29,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4943490.0, ans=0.125 2024-08-20 20:24:51,253 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 5400, loss[loss=0.1075, beats_loss=0.01087, ecapa_loss=0.0001266, whisper_loss=0.09533, over 22459.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01036, ecapa_loss=0.0001406, whisper_loss=0.09005, over 3792352.48 frames. ], batch size: 88, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:25:16,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4943790.0, ans=0.0 2024-08-20 20:25:18,088 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-20 20:25:31,966 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 28 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 20:25:42,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4943990.0, ans=0.125 2024-08-20 20:25:52,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4943990.0, ans=0.125 2024-08-20 20:26:13,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4944090.0, ans=0.125 2024-08-20 20:26:13,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4944090.0, ans=0.1 2024-08-20 20:26:17,974 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 5450, loss[loss=0.1082, beats_loss=0.01043, ecapa_loss=0.0001707, whisper_loss=0.09611, over 20496.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01035, ecapa_loss=0.0001394, whisper_loss=0.09012, over 3780491.63 frames. ], batch size: 88, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:26:35,173 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 21 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 20:26:40,802 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 8 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 20:26:59,337 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.28 vs. limit=15.0 2024-08-20 20:27:00,363 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-20 20:27:01,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4944390.0, ans=0.125 2024-08-20 20:27:16,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4944490.0, ans=0.125 2024-08-20 20:27:17,584 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.208e+01 2.419e+01 2.750e+01 4.613e+01, threshold=4.839e+01, percent-clipped=0.0 2024-08-20 20:27:26,718 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 24 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-20 20:27:37,000 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 28 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-20 20:27:48,025 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.52 vs. limit=15.0 2024-08-20 20:27:48,474 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 5500, loss[loss=0.1194, beats_loss=0.008305, ecapa_loss=0.0001859, whisper_loss=0.1093, over 16669.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01032, ecapa_loss=0.0001385, whisper_loss=0.09011, over 3777562.99 frames. ], batch size: 64, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:27:53,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4944690.0, ans=0.0 2024-08-20 20:28:34,790 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.77 vs. limit=12.0 2024-08-20 20:28:40,822 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 20:28:41,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=4944990.0, ans=0.2 2024-08-20 20:28:44,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4944990.0, ans=0.125 2024-08-20 20:28:54,500 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2024-08-20 20:28:55,333 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 18 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-20 20:28:56,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4944990.0, ans=0.09899494936611666 2024-08-20 20:29:09,521 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 26 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-20 20:29:16,160 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 5550, loss[loss=0.1085, beats_loss=0.01142, ecapa_loss=0.0001152, whisper_loss=0.09597, over 22127.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01027, ecapa_loss=0.0001393, whisper_loss=0.09057, over 3776099.77 frames. ], batch size: 85, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:29:35,754 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 29 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-20 20:29:46,599 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 14 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-20 20:29:59,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4945390.0, ans=0.125 2024-08-20 20:30:01,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4945390.0, ans=0.125 2024-08-20 20:30:05,817 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 10 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 20:30:07,564 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 17 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-20 20:30:12,484 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.276e+01 2.520e+01 2.741e+01 3.796e+01, threshold=5.039e+01, percent-clipped=0.0 2024-08-20 20:30:25,088 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 21 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-20 20:30:34,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4945590.0, ans=0.125 2024-08-20 20:30:34,545 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.71 vs. limit=10.0 2024-08-20 20:30:44,036 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 5600, loss[loss=0.1035, beats_loss=0.009481, ecapa_loss=0.0001656, whisper_loss=0.09236, over 21667.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01028, ecapa_loss=0.0001398, whisper_loss=0.09002, over 3776591.08 frames. ], batch size: 92, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:31:02,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4945790.0, ans=0.125 2024-08-20 20:31:07,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4945790.0, ans=0.0 2024-08-20 20:31:26,954 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 20:31:38,862 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 32 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 20:32:06,030 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 20:32:09,855 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 35 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-20 20:32:12,740 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 5650, loss[loss=0.09722, beats_loss=0.00965, ecapa_loss=0.0001686, whisper_loss=0.08588, over 23182.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01032, ecapa_loss=0.0001398, whisper_loss=0.08989, over 3805310.63 frames. ], batch size: 94, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:32:20,991 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 20:32:26,725 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 21 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-20 20:33:09,724 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.323e+01 2.509e+01 2.836e+01 4.746e+01, threshold=5.018e+01, percent-clipped=0.0 2024-08-20 20:33:20,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4946490.0, ans=0.0 2024-08-20 20:33:43,477 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 5700, loss[loss=0.08446, beats_loss=0.01395, ecapa_loss=0.0001294, whisper_loss=0.06922, over 21272.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01035, ecapa_loss=0.0001386, whisper_loss=0.09016, over 3818019.53 frames. ], batch size: 87, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:33:55,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4946690.0, ans=0.1 2024-08-20 20:34:11,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4946790.0, ans=0.125 2024-08-20 20:34:14,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4946790.0, ans=0.05 2024-08-20 20:34:20,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4946790.0, ans=0.1 2024-08-20 20:34:22,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4946890.0, ans=0.0 2024-08-20 20:34:46,019 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.71 vs. limit=22.5 2024-08-20 20:34:59,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4947090.0, ans=0.125 2024-08-20 20:35:13,000 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 20 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 20:35:15,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4947090.0, ans=0.125 2024-08-20 20:35:19,982 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 5750, loss[loss=0.1113, beats_loss=0.01113, ecapa_loss=0.0001624, whisper_loss=0.09858, over 22010.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01036, ecapa_loss=0.0001392, whisper_loss=0.08915, over 3794042.38 frames. ], batch size: 88, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:35:25,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4947190.0, ans=0.0 2024-08-20 20:35:26,764 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 18 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-20 20:36:05,331 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 27 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-20 20:36:23,813 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.253e+01 2.565e+01 2.811e+01 3.552e+01, threshold=5.130e+01, percent-clipped=0.0 2024-08-20 20:36:43,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4947590.0, ans=0.2 2024-08-20 20:36:57,632 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 5800, loss[loss=0.1285, beats_loss=0.007805, ecapa_loss=0.0001516, whisper_loss=0.1192, over 23062.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01041, ecapa_loss=0.0001389, whisper_loss=0.08957, over 3825998.06 frames. ], batch size: 87, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:37:02,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4947690.0, ans=0.1 2024-08-20 20:37:03,339 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.05 vs. limit=6.0 2024-08-20 20:37:03,799 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 26 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-20 20:37:12,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4947690.0, ans=0.0 2024-08-20 20:37:25,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4947790.0, ans=0.125 2024-08-20 20:37:42,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4947890.0, ans=0.0 2024-08-20 20:38:00,413 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.16 vs. limit=12.0 2024-08-20 20:38:17,075 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 14 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-20 20:38:23,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4948090.0, ans=0.2 2024-08-20 20:38:26,704 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 22 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-20 20:38:29,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4948090.0, ans=0.125 2024-08-20 20:38:30,173 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 20 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-20 20:38:30,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4948090.0, ans=0.125 2024-08-20 20:38:36,421 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 5850, loss[loss=0.1166, beats_loss=0.009328, ecapa_loss=0.0001452, whisper_loss=0.1058, over 22233.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01046, ecapa_loss=0.0001387, whisper_loss=0.08913, over 3802103.13 frames. ], batch size: 87, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:38:41,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4948190.0, ans=0.125 2024-08-20 20:38:56,057 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 13 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-20 20:39:00,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4948290.0, ans=0.0 2024-08-20 20:39:05,450 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 26 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 20:39:20,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=4948390.0, ans=15.0 2024-08-20 20:39:33,622 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.744e+01 2.256e+01 2.476e+01 2.710e+01 3.923e+01, threshold=4.952e+01, percent-clipped=0.0 2024-08-20 20:39:41,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4948490.0, ans=0.2 2024-08-20 20:39:47,419 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 23 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-20 20:39:48,531 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.34 vs. limit=22.5 2024-08-20 20:39:51,049 WARNING [optim.py:496] (3/4) Scaling gradients by 0.021723005920648575, model_norm_threshold=49.52134323120117 2024-08-20 20:39:51,215 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.25, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.306e+06, grad_sumsq=3.974e+05, orig_rms_sq=3.286e+00 2024-08-20 20:39:56,012 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.46 vs. limit=15.0 2024-08-20 20:39:58,668 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 20:40:05,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4948690.0, ans=0.0 2024-08-20 20:40:06,630 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 5900, loss[loss=0.08686, beats_loss=0.008966, ecapa_loss=0.0001274, whisper_loss=0.07662, over 15100.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01052, ecapa_loss=0.0001388, whisper_loss=0.08919, over 3806207.47 frames. ], batch size: 58, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:40:14,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4948690.0, ans=0.1 2024-08-20 20:40:14,957 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=22.5 2024-08-20 20:40:20,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4948690.0, ans=0.1 2024-08-20 20:40:30,173 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 24 from LS+wenet, 26 from Vox, 20 fro AS 2024-08-20 20:40:34,762 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 20:40:39,727 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 22 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-20 20:40:46,557 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 20 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-20 20:40:59,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4948990.0, ans=0.2 2024-08-20 20:41:14,571 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.76 vs. limit=12.0 2024-08-20 20:41:36,089 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 5950, loss[loss=0.08533, beats_loss=0.01149, ecapa_loss=0.0001295, whisper_loss=0.07254, over 17273.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01052, ecapa_loss=0.000139, whisper_loss=0.08927, over 3801267.34 frames. ], batch size: 69, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:41:37,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4949190.0, ans=0.125 2024-08-20 20:41:52,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4949290.0, ans=0.0 2024-08-20 20:41:58,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4949290.0, ans=0.125 2024-08-20 20:42:01,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4949290.0, ans=0.125 2024-08-20 20:42:14,041 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 14 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 20:42:28,733 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 20:42:33,655 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.390e+01 2.635e+01 2.817e+01 2.280e+03, threshold=5.271e+01, percent-clipped=1.0 2024-08-20 20:42:43,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4949490.0, ans=0.0 2024-08-20 20:42:54,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4949590.0, ans=0.2 2024-08-20 20:42:54,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4949590.0, ans=0.0 2024-08-20 20:43:06,090 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 6000, loss[loss=0.1147, beats_loss=0.01018, ecapa_loss=0.000142, whisper_loss=0.1031, over 23284.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01047, ecapa_loss=0.000139, whisper_loss=0.08987, over 3825863.92 frames. ], batch size: 93, lr: 1.80e-03, grad_scale: 1.152921504606847e+18 2024-08-20 20:43:06,091 INFO [train_multi_KD3.py:1140] (3/4) Computing validation loss 2024-08-20 20:43:58,007 INFO [train_multi_KD3.py:1150] (3/4) Epoch 34, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.0005083, whisper_loss=0.249, over 931116.00 frames. 2024-08-20 20:44:22,254 INFO [train_multi_KD3.py:1150] (3/4) Epoch 34, validation on SV_voxceleb1: loss=0.003999, beats_loss=0, ecapa_loss=0.0003999, whisper_loss=0, over 944235.00 frames. 2024-08-20 20:45:10,010 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([0.0006, 0.0365, 0.0016, 0.0322, 0.0003, 0.0866, 0.0214, 0.0595], device='cuda:3') 2024-08-20 20:45:57,893 INFO [train_multi_KD3.py:1150] (3/4) Epoch 34, validation on AT_audioset: loss=0.02294, beats_loss=0.02294, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 20:45:57,896 INFO [train_multi_KD3.py:1156] (3/4) Maximum memory allocated so far is 32606MB 2024-08-20 20:46:01,242 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 23 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-20 20:46:03,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4949690.0, ans=0.1 2024-08-20 20:46:15,339 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 18 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-20 20:46:25,905 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 21 from LS+wenet, 9 from Vox, 29 fro AS 2024-08-20 20:46:36,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.65 vs. limit=22.5 2024-08-20 20:46:40,613 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 15 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-20 20:46:45,890 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 20 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-20 20:46:51,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=4949890.0, ans=0.05 2024-08-20 20:46:53,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=15.0 2024-08-20 20:46:57,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4949990.0, ans=0.125 2024-08-20 20:47:07,258 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=15.0 2024-08-20 20:47:07,893 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 34 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-20 20:47:10,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4949990.0, ans=0.125 2024-08-20 20:47:20,858 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 21 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 20:47:32,321 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.12 vs. limit=22.5 2024-08-20 20:47:36,898 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 6050, loss[loss=0.09108, beats_loss=0.0115, ecapa_loss=0.00011, whisper_loss=0.07848, over 21609.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01057, ecapa_loss=0.0001397, whisper_loss=0.08856, over 3824344.67 frames. ], batch size: 83, lr: 1.80e-03, grad_scale: 1.152921504606847e+18 2024-08-20 20:47:53,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4950190.0, ans=0.125 2024-08-20 20:48:06,401 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 20 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-20 20:48:10,560 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 26 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 20:48:24,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4950390.0, ans=0.05 2024-08-20 20:48:27,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4950390.0, ans=0.125 2024-08-20 20:48:50,451 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 20 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-20 20:48:53,428 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.341e+01 2.597e+01 2.876e+01 5.831e+01, threshold=5.193e+01, percent-clipped=1.0 2024-08-20 20:48:57,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4950490.0, ans=0.125 2024-08-20 20:48:59,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4950490.0, ans=0.125 2024-08-20 20:49:00,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4950490.0, ans=0.0 2024-08-20 20:49:01,420 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.30 vs. limit=12.0 2024-08-20 20:49:21,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4950590.0, ans=0.125 2024-08-20 20:49:27,821 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 6100, loss[loss=0.09517, beats_loss=0.01087, ecapa_loss=0.0001082, whisper_loss=0.08321, over 16654.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01064, ecapa_loss=0.0001384, whisper_loss=0.08902, over 3839077.95 frames. ], batch size: 63, lr: 1.80e-03, grad_scale: 1.152921504606847e+18 2024-08-20 20:49:31,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4950690.0, ans=0.0 2024-08-20 20:49:52,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4950790.0, ans=0.0 2024-08-20 20:50:01,957 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 30 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-20 20:50:32,320 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.30 vs. limit=15.0 2024-08-20 20:50:46,705 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 39 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 20:51:04,163 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 20:51:17,216 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 6150, loss[loss=0.093, beats_loss=0.01209, ecapa_loss=0.0001327, whisper_loss=0.07959, over 22276.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01063, ecapa_loss=0.0001379, whisper_loss=0.08905, over 3822606.69 frames. ], batch size: 86, lr: 1.80e-03, grad_scale: 1.152921504606847e+18 2024-08-20 20:51:30,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4951190.0, ans=0.2 2024-08-20 20:51:33,848 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.66 vs. limit=15.0 2024-08-20 20:51:39,461 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-20 20:51:52,059 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 33 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 20:52:07,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4951390.0, ans=0.0 2024-08-20 20:52:07,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4951390.0, ans=0.125 2024-08-20 20:52:27,829 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.295e+01 2.472e+01 2.689e+01 4.282e+01, threshold=4.944e+01, percent-clipped=0.0 2024-08-20 20:52:42,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4951590.0, ans=0.0 2024-08-20 20:52:43,530 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 26 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 20:52:54,622 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.51 vs. limit=22.5 2024-08-20 20:53:03,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4951590.0, ans=0.125 2024-08-20 20:53:06,735 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 6200, loss[loss=0.07297, beats_loss=0.01086, ecapa_loss=0.0001158, whisper_loss=0.06095, over 14059.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01068, ecapa_loss=0.0001362, whisper_loss=0.089, over 3818111.93 frames. ], batch size: 54, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:53:37,762 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 27 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-20 20:53:41,906 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 27 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 20:54:11,308 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 16 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-20 20:54:30,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4951990.0, ans=0.09899494936611666 2024-08-20 20:54:56,590 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 6250, loss[loss=0.0917, beats_loss=0.008368, ecapa_loss=0.0001586, whisper_loss=0.08175, over 14890.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01064, ecapa_loss=0.0001369, whisper_loss=0.08895, over 3803339.65 frames. ], batch size: 59, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:55:08,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4952190.0, ans=0.125 2024-08-20 20:55:14,121 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 20 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 20:55:33,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4952290.0, ans=0.1 2024-08-20 20:56:06,125 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.296e+01 2.528e+01 2.851e+01 2.776e+02, threshold=5.056e+01, percent-clipped=4.0 2024-08-20 20:56:24,125 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 13 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-20 20:56:37,191 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.97 vs. limit=15.0 2024-08-20 20:56:45,030 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 16 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 20:56:47,425 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 6300, loss[loss=0.09553, beats_loss=0.01144, ecapa_loss=0.0001122, whisper_loss=0.08297, over 15310.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01057, ecapa_loss=0.0001398, whisper_loss=0.08884, over 3791471.81 frames. ], batch size: 59, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:57:06,873 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 18 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-20 20:58:03,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.80 vs. limit=15.0 2024-08-20 20:58:17,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4952990.0, ans=0.125 2024-08-20 20:58:19,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4952990.0, ans=0.125 2024-08-20 20:58:19,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4952990.0, ans=0.125 2024-08-20 20:58:28,476 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=15.0 2024-08-20 20:58:30,703 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.12 vs. limit=15.0 2024-08-20 20:58:43,780 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.81 vs. limit=22.5 2024-08-20 20:58:44,230 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 6350, loss[loss=0.1021, beats_loss=0.01073, ecapa_loss=0.0001407, whisper_loss=0.08995, over 20393.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01048, ecapa_loss=0.0001403, whisper_loss=0.08901, over 3813754.85 frames. ], batch size: 81, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:58:46,452 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 16 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-20 20:59:00,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4953190.0, ans=0.125 2024-08-20 20:59:05,988 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 35 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 20:59:12,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4953290.0, ans=0.125 2024-08-20 20:59:19,355 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 20:59:24,657 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.02 vs. limit=15.0 2024-08-20 20:59:28,598 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.19 vs. limit=15.0 2024-08-20 20:59:45,559 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 28 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-20 20:59:46,997 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-20 20:59:52,783 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.372e+01 2.595e+01 2.941e+01 1.196e+02, threshold=5.191e+01, percent-clipped=6.0 2024-08-20 20:59:58,914 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 21:00:02,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4953490.0, ans=0.2 2024-08-20 21:00:15,712 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 27 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 21:00:29,446 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 6400, loss[loss=0.1114, beats_loss=0.0087, ecapa_loss=0.0001276, whisper_loss=0.1014, over 20333.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01032, ecapa_loss=0.0001407, whisper_loss=0.09008, over 3797590.14 frames. ], batch size: 79, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:00:34,687 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.20 vs. limit=12.0 2024-08-20 21:00:34,772 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.70 vs. limit=22.5 2024-08-20 21:00:59,320 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.57 vs. limit=12.0 2024-08-20 21:01:06,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4953890.0, ans=0.125 2024-08-20 21:01:44,940 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 21:01:51,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4954090.0, ans=0.125 2024-08-20 21:01:56,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4954090.0, ans=0.125 2024-08-20 21:01:58,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4954090.0, ans=0.125 2024-08-20 21:02:08,847 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 6450, loss[loss=0.0995, beats_loss=0.01159, ecapa_loss=0.0001377, whisper_loss=0.08654, over 21175.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01032, ecapa_loss=0.000142, whisper_loss=0.08965, over 3778823.14 frames. ], batch size: 87, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:02:26,436 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-20 21:02:27,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4954290.0, ans=0.1 2024-08-20 21:02:55,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4954390.0, ans=0.125 2024-08-20 21:02:57,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4954390.0, ans=0.0 2024-08-20 21:03:05,810 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 26 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-20 21:03:11,012 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.299e+01 2.553e+01 2.896e+01 1.351e+02, threshold=5.106e+01, percent-clipped=1.0 2024-08-20 21:03:14,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4954490.0, ans=0.0 2024-08-20 21:03:17,289 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=22.5 2024-08-20 21:03:18,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4954490.0, ans=0.09899494936611666 2024-08-20 21:03:19,091 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=15.0 2024-08-20 21:03:20,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4954490.0, ans=0.0 2024-08-20 21:03:23,702 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 17 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-20 21:03:31,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4954590.0, ans=0.125 2024-08-20 21:03:44,647 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 6500, loss[loss=0.0852, beats_loss=0.008201, ecapa_loss=0.0002164, whisper_loss=0.07483, over 18920.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0103, ecapa_loss=0.0001428, whisper_loss=0.08983, over 3789426.17 frames. ], batch size: 84, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:03:52,283 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.24 vs. limit=22.5 2024-08-20 21:03:52,355 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.61 vs. limit=12.0 2024-08-20 21:04:48,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4954890.0, ans=0.0 2024-08-20 21:05:30,064 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 20 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-20 21:05:33,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4955090.0, ans=0.125 2024-08-20 21:05:40,972 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 6550, loss[loss=0.09979, beats_loss=0.01136, ecapa_loss=0.0001218, whisper_loss=0.08721, over 23612.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01027, ecapa_loss=0.0001432, whisper_loss=0.09013, over 3815908.30 frames. ], batch size: 93, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:05:41,219 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 29 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-20 21:06:10,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4955290.0, ans=0.0 2024-08-20 21:06:57,269 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.272e+01 2.491e+01 2.852e+01 4.089e+01, threshold=4.982e+01, percent-clipped=0.0 2024-08-20 21:07:21,094 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 19 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-20 21:07:30,697 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 37 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 21:07:37,681 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 6600, loss[loss=0.09827, beats_loss=0.01006, ecapa_loss=0.0001441, whisper_loss=0.08676, over 19706.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01033, ecapa_loss=0.0001415, whisper_loss=0.08983, over 3829769.38 frames. ], batch size: 79, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:07:47,257 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2024-08-20 21:07:59,360 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 25 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 21:08:24,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4955790.0, ans=0.0 2024-08-20 21:08:25,123 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 21 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 21:08:43,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4955890.0, ans=0.0 2024-08-20 21:08:50,376 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.38 vs. limit=15.0 2024-08-20 21:08:56,166 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 14 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 21:09:01,140 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.19 vs. limit=6.0 2024-08-20 21:09:04,577 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 24 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-20 21:09:10,501 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 28 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-20 21:09:28,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4956090.0, ans=0.125 2024-08-20 21:09:47,610 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 6650, loss[loss=0.1066, beats_loss=0.01063, ecapa_loss=0.0001337, whisper_loss=0.0946, over 20816.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01028, ecapa_loss=0.000141, whisper_loss=0.09105, over 3838967.86 frames. ], batch size: 85, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:10:11,132 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-08-20 21:10:27,514 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.49 vs. limit=22.5 2024-08-20 21:10:28,668 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 14 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-20 21:10:47,470 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.323e+01 2.536e+01 2.903e+01 4.430e+01, threshold=5.073e+01, percent-clipped=0.0 2024-08-20 21:10:48,064 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 26 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 21:10:50,564 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.87 vs. limit=12.0 2024-08-20 21:11:18,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4956690.0, ans=0.2 2024-08-20 21:11:18,826 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 6700, loss[loss=0.09592, beats_loss=0.01063, ecapa_loss=0.0001467, whisper_loss=0.08383, over 21437.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01029, ecapa_loss=0.0001415, whisper_loss=0.09097, over 3844210.22 frames. ], batch size: 87, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:11:23,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4956690.0, ans=0.125 2024-08-20 21:11:28,967 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.79 vs. limit=22.5 2024-08-20 21:11:37,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4956790.0, ans=0.0 2024-08-20 21:11:53,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4956890.0, ans=0.0 2024-08-20 21:11:54,569 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 29 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 21:12:00,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4956890.0, ans=0.0 2024-08-20 21:12:00,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4956890.0, ans=0.125 2024-08-20 21:12:23,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4956990.0, ans=0.1 2024-08-20 21:12:28,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4957090.0, ans=0.1 2024-08-20 21:12:33,162 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 36 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-20 21:12:37,351 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=15.0 2024-08-20 21:12:46,392 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 6750, loss[loss=0.1004, beats_loss=0.01075, ecapa_loss=0.0001198, whisper_loss=0.08846, over 19111.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01025, ecapa_loss=0.0001422, whisper_loss=0.09136, over 3810747.97 frames. ], batch size: 76, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:13:26,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4957390.0, ans=0.0 2024-08-20 21:13:44,072 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.394e+01 2.668e+01 3.101e+01 4.157e+01, threshold=5.336e+01, percent-clipped=0.0 2024-08-20 21:13:44,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4957490.0, ans=0.2 2024-08-20 21:13:46,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4957490.0, ans=0.125 2024-08-20 21:13:53,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4957490.0, ans=0.125 2024-08-20 21:14:04,793 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 24 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-20 21:14:11,816 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 18 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-20 21:14:12,991 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 6800, loss[loss=0.08566, beats_loss=0.01018, ecapa_loss=0.0001232, whisper_loss=0.07424, over 16771.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01022, ecapa_loss=0.000142, whisper_loss=0.09119, over 3813828.77 frames. ], batch size: 64, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:14:13,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4957690.0, ans=0.125 2024-08-20 21:14:22,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4957690.0, ans=0.2 2024-08-20 21:14:24,684 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 27 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 21:14:30,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4957790.0, ans=0.0 2024-08-20 21:14:43,010 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.174e+00 2024-08-20 21:14:45,693 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 18 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-20 21:14:46,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4957890.0, ans=0.2 2024-08-20 21:14:50,072 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.73 vs. limit=12.0 2024-08-20 21:15:17,280 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-20 21:15:18,493 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.99 vs. limit=22.5 2024-08-20 21:15:26,150 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 19 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-20 21:15:39,295 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 6850, loss[loss=0.07081, beats_loss=0.01392, ecapa_loss=0.0001298, whisper_loss=0.05559, over 18412.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01027, ecapa_loss=0.0001417, whisper_loss=0.09079, over 3833852.47 frames. ], batch size: 75, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:15:39,812 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 24 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-20 21:16:11,350 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2024-08-20 21:16:17,716 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-20 21:16:21,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4958390.0, ans=0.125 2024-08-20 21:16:23,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4958390.0, ans=0.125 2024-08-20 21:16:30,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4958490.0, ans=0.0 2024-08-20 21:16:36,572 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.698e+01 2.275e+01 2.461e+01 2.676e+01 7.935e+01, threshold=4.923e+01, percent-clipped=1.0 2024-08-20 21:16:40,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4958490.0, ans=0.0 2024-08-20 21:16:45,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4958490.0, ans=0.1 2024-08-20 21:16:56,984 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 20 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-20 21:17:06,145 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 6900, loss[loss=0.1142, beats_loss=0.01018, ecapa_loss=0.0001389, whisper_loss=0.1027, over 22671.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01029, ecapa_loss=0.0001404, whisper_loss=0.09064, over 3797899.80 frames. ], batch size: 92, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:17:09,262 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 16 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 21:17:09,892 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.126e+05 2024-08-20 21:17:41,956 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 21:17:46,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4958890.0, ans=0.125 2024-08-20 21:18:01,084 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 16 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-20 21:18:03,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4958990.0, ans=0.0 2024-08-20 21:18:11,626 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 18 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-20 21:18:31,990 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 6950, loss[loss=0.0718, beats_loss=0.01191, ecapa_loss=0.0001862, whisper_loss=0.05803, over 18461.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01026, ecapa_loss=0.0001402, whisper_loss=0.09022, over 3773812.69 frames. ], batch size: 82, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:18:32,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4959190.0, ans=0.95 2024-08-20 21:18:39,802 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2024-08-20 21:19:29,585 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.291e+01 2.502e+01 2.810e+01 1.652e+02, threshold=5.004e+01, percent-clipped=1.0 2024-08-20 21:19:58,703 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 7000, loss[loss=0.08022, beats_loss=0.01205, ecapa_loss=0.0001093, whisper_loss=0.06708, over 14887.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01039, ecapa_loss=0.0001389, whisper_loss=0.08972, over 3774478.73 frames. ], batch size: 56, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:20:06,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4959690.0, ans=0.125 2024-08-20 21:20:10,238 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 28 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-20 21:20:19,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4959790.0, ans=0.125 2024-08-20 21:20:28,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4959790.0, ans=0.07 2024-08-20 21:20:31,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4959790.0, ans=0.125 2024-08-20 21:20:34,367 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 19 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-20 21:20:47,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4959890.0, ans=0.125 2024-08-20 21:20:47,641 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.82 vs. limit=15.0 2024-08-20 21:21:17,741 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.99 vs. limit=22.5 2024-08-20 21:21:23,215 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2024-08-20 21:21:24,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4960090.0, ans=0.125 2024-08-20 21:21:29,268 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 7050, loss[loss=0.0968, beats_loss=0.01219, ecapa_loss=0.0001428, whisper_loss=0.08318, over 21694.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01038, ecapa_loss=0.0001389, whisper_loss=0.08958, over 3779012.44 frames. ], batch size: 91, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:21:30,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.77 vs. limit=15.0 2024-08-20 21:21:48,415 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-20 21:22:25,552 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.286e+01 2.529e+01 2.848e+01 4.260e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-20 21:22:46,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4960590.0, ans=0.0 2024-08-20 21:22:49,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4960590.0, ans=0.0 2024-08-20 21:22:55,538 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 7100, loss[loss=0.09677, beats_loss=0.009355, ecapa_loss=0.000163, whisper_loss=0.08579, over 18636.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01041, ecapa_loss=0.0001391, whisper_loss=0.08891, over 3769531.79 frames. ], batch size: 76, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:22:58,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4960690.0, ans=0.1 2024-08-20 21:23:01,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4960690.0, ans=0.125 2024-08-20 21:23:04,439 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.91 vs. limit=15.0 2024-08-20 21:23:07,293 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 19 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-20 21:23:07,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4960690.0, ans=0.125 2024-08-20 21:23:14,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4960790.0, ans=0.0 2024-08-20 21:23:16,213 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 35 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 21:23:16,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4960790.0, ans=0.1 2024-08-20 21:23:20,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4960790.0, ans=0.0 2024-08-20 21:23:44,724 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 25 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-20 21:23:51,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4960990.0, ans=0.2 2024-08-20 21:24:09,233 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.73 vs. limit=10.0 2024-08-20 21:24:11,187 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-08-20 21:24:12,059 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-20 21:24:16,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4961090.0, ans=0.125 2024-08-20 21:24:24,150 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 7150, loss[loss=0.103, beats_loss=0.01081, ecapa_loss=0.0001542, whisper_loss=0.09065, over 22535.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01054, ecapa_loss=0.0001382, whisper_loss=0.08895, over 3777943.89 frames. ], batch size: 93, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:24:27,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4961190.0, ans=0.125 2024-08-20 21:24:51,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4961290.0, ans=0.0 2024-08-20 21:24:51,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4961290.0, ans=0.1 2024-08-20 21:24:57,684 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 15 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-20 21:25:21,935 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.246e+01 2.471e+01 2.747e+01 3.291e+02, threshold=4.942e+01, percent-clipped=1.0 2024-08-20 21:25:38,114 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 27 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-20 21:25:44,780 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 19 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-20 21:25:51,818 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 7200, loss[loss=0.1216, beats_loss=0.009756, ecapa_loss=0.0001537, whisper_loss=0.1103, over 23215.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01052, ecapa_loss=0.0001377, whisper_loss=0.08888, over 3755799.11 frames. ], batch size: 94, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:26:10,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4961790.0, ans=0.125 2024-08-20 21:26:24,092 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 21 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-20 21:26:32,296 WARNING [optim.py:496] (3/4) Scaling gradients by 0.05625057592988014, model_norm_threshold=49.41666793823242 2024-08-20 21:26:32,466 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.330e+05, grad_sumsq=1.330e+05, orig_rms_sq=1.000e+00 2024-08-20 21:26:36,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4961890.0, ans=0.125 2024-08-20 21:26:45,614 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.68 vs. limit=15.0 2024-08-20 21:27:21,186 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 7250, loss[loss=0.09557, beats_loss=0.009426, ecapa_loss=0.0001588, whisper_loss=0.08456, over 22059.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01042, ecapa_loss=0.0001379, whisper_loss=0.08978, over 3753309.29 frames. ], batch size: 90, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:27:36,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4962190.0, ans=0.125 2024-08-20 21:27:38,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4962290.0, ans=0.1 2024-08-20 21:27:41,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4962290.0, ans=0.1 2024-08-20 21:28:03,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4962390.0, ans=0.0 2024-08-20 21:28:05,452 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 32 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 21:28:05,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4962390.0, ans=0.125 2024-08-20 21:28:18,697 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.295e+01 2.557e+01 2.872e+01 8.785e+02, threshold=5.114e+01, percent-clipped=5.0 2024-08-20 21:28:28,462 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.35 vs. limit=22.5 2024-08-20 21:28:38,440 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-20 21:28:49,186 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 7300, loss[loss=0.1023, beats_loss=0.01186, ecapa_loss=0.0001451, whisper_loss=0.08898, over 22050.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01048, ecapa_loss=0.0001389, whisper_loss=0.08932, over 3780922.45 frames. ], batch size: 89, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:28:55,960 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 21:29:08,200 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2024-08-20 21:29:44,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4962990.0, ans=0.125 2024-08-20 21:29:50,180 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 30 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 21:29:57,965 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 22 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 21:30:03,288 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 25 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 21:30:12,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4963090.0, ans=0.1 2024-08-20 21:30:15,017 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 7350, loss[loss=0.09943, beats_loss=0.009903, ecapa_loss=0.000144, whisper_loss=0.08809, over 22689.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01046, ecapa_loss=0.0001399, whisper_loss=0.08933, over 3778192.75 frames. ], batch size: 94, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:30:23,414 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.00 vs. limit=22.5 2024-08-20 21:30:26,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4963190.0, ans=0.0 2024-08-20 21:30:33,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4963290.0, ans=0.125 2024-08-20 21:30:38,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4963290.0, ans=10.0 2024-08-20 21:30:40,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4963290.0, ans=0.125 2024-08-20 21:30:43,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4963290.0, ans=10.0 2024-08-20 21:30:45,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4963290.0, ans=0.2 2024-08-20 21:30:48,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4963390.0, ans=0.125 2024-08-20 21:30:53,377 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 36 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-20 21:31:07,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4963490.0, ans=0.07 2024-08-20 21:31:11,466 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.257e+01 2.510e+01 2.739e+01 2.616e+02, threshold=5.019e+01, percent-clipped=1.0 2024-08-20 21:31:12,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4963490.0, ans=0.2 2024-08-20 21:31:17,946 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.61 vs. limit=15.0 2024-08-20 21:31:23,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4963590.0, ans=0.125 2024-08-20 21:31:25,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4963590.0, ans=0.0 2024-08-20 21:31:40,859 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 7400, loss[loss=0.1059, beats_loss=0.008546, ecapa_loss=0.0001226, whisper_loss=0.0961, over 17550.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01045, ecapa_loss=0.0001395, whisper_loss=0.08966, over 3808068.35 frames. ], batch size: 69, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:31:41,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4963690.0, ans=0.09899494936611666 2024-08-20 21:32:00,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4963790.0, ans=0.0 2024-08-20 21:32:49,649 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.04 vs. limit=22.5 2024-08-20 21:33:00,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4964090.0, ans=0.125 2024-08-20 21:33:01,003 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 25 from LS+wenet, 29 from Vox, 23 fro AS 2024-08-20 21:33:09,747 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 7450, loss[loss=0.1201, beats_loss=0.01033, ecapa_loss=0.0001521, whisper_loss=0.1082, over 23336.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01042, ecapa_loss=0.0001408, whisper_loss=0.09023, over 3831261.21 frames. ], batch size: 92, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:33:16,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4964190.0, ans=0.125 2024-08-20 21:33:30,536 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 26 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-20 21:33:36,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4964290.0, ans=0.2 2024-08-20 21:33:45,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4964390.0, ans=0.125 2024-08-20 21:33:50,029 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 22 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-20 21:33:50,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4964390.0, ans=0.125 2024-08-20 21:33:54,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4964390.0, ans=0.0 2024-08-20 21:33:54,510 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.54 vs. limit=15.0 2024-08-20 21:34:08,751 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.276e+01 2.553e+01 2.837e+01 3.852e+01, threshold=5.106e+01, percent-clipped=0.0 2024-08-20 21:34:11,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4964490.0, ans=0.125 2024-08-20 21:34:17,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4964490.0, ans=0.0 2024-08-20 21:34:39,233 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 7500, loss[loss=0.08373, beats_loss=0.0128, ecapa_loss=0.0001146, whisper_loss=0.06979, over 21259.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01045, ecapa_loss=0.0001406, whisper_loss=0.08947, over 3854143.86 frames. ], batch size: 86, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:34:43,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4964690.0, ans=0.1 2024-08-20 21:34:49,764 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 25 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-20 21:35:18,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4964890.0, ans=0.125 2024-08-20 21:35:37,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4964990.0, ans=0.1 2024-08-20 21:35:42,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4964990.0, ans=0.0 2024-08-20 21:36:05,648 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 7550, loss[loss=0.1124, beats_loss=0.0114, ecapa_loss=0.0001339, whisper_loss=0.09964, over 23300.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01039, ecapa_loss=0.0001401, whisper_loss=0.08969, over 3861224.58 frames. ], batch size: 93, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:36:19,745 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 24 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-20 21:36:39,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4965390.0, ans=0.125 2024-08-20 21:36:43,111 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.28 vs. limit=22.5 2024-08-20 21:37:02,546 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.306e+01 2.507e+01 2.711e+01 6.032e+01, threshold=5.014e+01, percent-clipped=1.0 2024-08-20 21:37:14,815 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 23 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-20 21:37:25,054 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 25 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-20 21:37:30,232 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 27 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-20 21:37:31,662 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 7600, loss[loss=0.08964, beats_loss=0.009744, ecapa_loss=0.0001398, whisper_loss=0.0785, over 23681.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01038, ecapa_loss=0.0001403, whisper_loss=0.08986, over 3849482.92 frames. ], batch size: 95, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:37:34,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4965690.0, ans=0.1 2024-08-20 21:37:37,030 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 18 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 21:37:44,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4965690.0, ans=0.2 2024-08-20 21:37:47,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4965790.0, ans=0.125 2024-08-20 21:37:47,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4965790.0, ans=0.1 2024-08-20 21:37:50,839 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.85 vs. limit=15.0 2024-08-20 21:38:18,367 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 21:38:28,373 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 25 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-20 21:38:39,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4966090.0, ans=0.125 2024-08-20 21:38:40,930 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.934e+00 2024-08-20 21:38:42,095 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 29 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-20 21:38:46,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4966090.0, ans=0.0 2024-08-20 21:38:47,404 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 26 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-20 21:38:56,968 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 7650, loss[loss=0.1198, beats_loss=0.01049, ecapa_loss=0.0001298, whisper_loss=0.108, over 19082.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01044, ecapa_loss=0.0001404, whisper_loss=0.08998, over 3864293.39 frames. ], batch size: 72, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:39:31,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4966390.0, ans=0.0 2024-08-20 21:39:53,587 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.307e+01 2.519e+01 2.833e+01 3.884e+01, threshold=5.038e+01, percent-clipped=0.0 2024-08-20 21:40:23,461 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 7700, loss[loss=0.08561, beats_loss=0.01034, ecapa_loss=0.0001428, whisper_loss=0.07384, over 22183.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01036, ecapa_loss=0.0001404, whisper_loss=0.09072, over 3831901.48 frames. ], batch size: 90, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:40:29,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4966690.0, ans=0.125 2024-08-20 21:40:36,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4966690.0, ans=0.1 2024-08-20 21:40:36,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4966690.0, ans=0.0 2024-08-20 21:40:39,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4966790.0, ans=0.125 2024-08-20 21:40:55,164 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.56 vs. limit=15.0 2024-08-20 21:40:59,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4966890.0, ans=0.2 2024-08-20 21:41:01,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4966890.0, ans=0.0 2024-08-20 21:41:03,076 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 18 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 21:41:06,252 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 33 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 21:41:25,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4966990.0, ans=0.125 2024-08-20 21:41:48,251 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.08 vs. limit=15.0 2024-08-20 21:41:48,702 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 7750, loss[loss=0.115, beats_loss=0.01015, ecapa_loss=0.0001203, whisper_loss=0.1037, over 22657.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01041, ecapa_loss=0.0001401, whisper_loss=0.09002, over 3845987.49 frames. ], batch size: 87, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:42:25,519 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.72 vs. limit=15.0 2024-08-20 21:42:47,270 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.264e+01 2.525e+01 2.747e+01 3.905e+01, threshold=5.051e+01, percent-clipped=0.0 2024-08-20 21:42:48,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4967490.0, ans=0.125 2024-08-20 21:42:50,887 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 25 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 21:43:03,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4967590.0, ans=0.0 2024-08-20 21:43:10,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4967590.0, ans=0.1 2024-08-20 21:43:16,723 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 7800, loss[loss=0.119, beats_loss=0.00964, ecapa_loss=0.000132, whisper_loss=0.108, over 19666.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01039, ecapa_loss=0.0001394, whisper_loss=0.08987, over 3788498.74 frames. ], batch size: 77, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:43:21,471 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2024-08-20 21:43:42,493 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-20 21:43:53,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4967890.0, ans=0.0 2024-08-20 21:44:05,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4967890.0, ans=0.015 2024-08-20 21:44:22,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4967990.0, ans=0.1 2024-08-20 21:44:43,125 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 7850, loss[loss=0.1095, beats_loss=0.011, ecapa_loss=0.0001646, whisper_loss=0.09687, over 20853.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01037, ecapa_loss=0.0001404, whisper_loss=0.09009, over 3822123.47 frames. ], batch size: 89, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:44:58,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4968190.0, ans=0.125 2024-08-20 21:45:14,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4968290.0, ans=0.125 2024-08-20 21:45:15,543 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 23 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-20 21:45:20,965 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 15 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-20 21:45:41,493 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.317e+01 2.497e+01 2.913e+01 5.826e+01, threshold=4.993e+01, percent-clipped=1.0 2024-08-20 21:45:48,512 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 20 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-20 21:45:50,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4968490.0, ans=0.125 2024-08-20 21:46:11,159 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 7900, loss[loss=0.1233, beats_loss=0.008113, ecapa_loss=0.0001424, whisper_loss=0.1137, over 17063.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01034, ecapa_loss=0.0001409, whisper_loss=0.08962, over 3809275.81 frames. ], batch size: 66, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:46:20,422 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.22 vs. limit=12.0 2024-08-20 21:46:24,332 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.58 vs. limit=15.0 2024-08-20 21:46:25,049 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 23 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-20 21:46:38,037 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.21 vs. limit=22.5 2024-08-20 21:46:47,857 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.506e+05 2024-08-20 21:47:05,352 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.62 vs. limit=22.5 2024-08-20 21:47:09,293 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.438e+00 2024-08-20 21:47:23,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4969090.0, ans=0.125 2024-08-20 21:47:31,886 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 22 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-20 21:47:38,931 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 7950, loss[loss=0.1106, beats_loss=0.007483, ecapa_loss=0.0001621, whisper_loss=0.1015, over 21519.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01031, ecapa_loss=0.0001408, whisper_loss=0.08937, over 3808276.50 frames. ], batch size: 84, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:47:39,171 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 14 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-20 21:47:42,364 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 26 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-20 21:47:52,846 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.19 vs. limit=12.0 2024-08-20 21:47:59,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4969290.0, ans=0.2 2024-08-20 21:48:32,199 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 15 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-20 21:48:37,321 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.335e+01 2.581e+01 2.814e+01 4.962e+01, threshold=5.162e+01, percent-clipped=0.0 2024-08-20 21:48:44,822 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 15 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-20 21:49:07,279 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 8000, loss[loss=0.1028, beats_loss=0.01012, ecapa_loss=0.0001502, whisper_loss=0.0912, over 16297.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01037, ecapa_loss=0.0001405, whisper_loss=0.08932, over 3793742.28 frames. ], batch size: 68, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:49:07,480 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 19 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 21:49:08,454 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.01 vs. limit=12.0 2024-08-20 21:49:47,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4969890.0, ans=0.0 2024-08-20 21:50:00,727 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0865812599658966, model_norm_threshold=51.61667251586914 2024-08-20 21:50:00,898 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.3.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.075e+04, grad_sumsq=5.075e+04, orig_rms_sq=1.000e+00 2024-08-20 21:50:30,146 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 28 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-20 21:50:34,968 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 8050, loss[loss=0.1031, beats_loss=0.01052, ecapa_loss=0.0001636, whisper_loss=0.09092, over 21409.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01039, ecapa_loss=0.0001405, whisper_loss=0.08928, over 3763273.73 frames. ], batch size: 93, lr: 1.80e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:50:49,622 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 22 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 21:51:27,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4970490.0, ans=0.125 2024-08-20 21:51:37,870 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.294e+01 2.558e+01 2.951e+01 5.962e+02, threshold=5.117e+01, percent-clipped=2.0 2024-08-20 21:52:03,206 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2024-08-20 21:52:03,753 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 8100, loss[loss=0.1202, beats_loss=0.007792, ecapa_loss=0.0001756, whisper_loss=0.1107, over 16522.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0104, ecapa_loss=0.00014, whisper_loss=0.08973, over 3767707.05 frames. ], batch size: 67, lr: 1.80e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:52:20,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=4970690.0, ans=10.0 2024-08-20 21:52:21,591 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 17 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-20 21:52:41,990 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 26 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-20 21:52:47,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4970890.0, ans=0.125 2024-08-20 21:52:55,228 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 16 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-20 21:53:27,514 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 29 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-20 21:53:34,300 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 8150, loss[loss=0.09394, beats_loss=0.009192, ecapa_loss=0.0001631, whisper_loss=0.08312, over 16478.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01041, ecapa_loss=0.0001396, whisper_loss=0.08992, over 3769127.82 frames. ], batch size: 67, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:54:01,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4971290.0, ans=0.0 2024-08-20 21:54:04,057 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.19 vs. limit=22.5 2024-08-20 21:54:11,917 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 29 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-20 21:54:24,458 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-20 21:54:31,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4971490.0, ans=0.0 2024-08-20 21:54:33,358 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=15.0 2024-08-20 21:54:37,846 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.199e+01 2.482e+01 2.728e+01 1.075e+02, threshold=4.963e+01, percent-clipped=1.0 2024-08-20 21:54:43,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4971490.0, ans=0.125 2024-08-20 21:55:04,249 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 8200, loss[loss=0.105, beats_loss=0.01128, ecapa_loss=0.0001341, whisper_loss=0.09239, over 22294.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.0001395, whisper_loss=0.09045, over 3802884.51 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:55:10,152 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 25 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-20 21:55:48,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4971890.0, ans=0.09899494936611666 2024-08-20 21:55:55,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4971890.0, ans=0.0 2024-08-20 21:56:00,720 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-20 21:56:12,958 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 21:56:15,817 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.13 vs. limit=15.0 2024-08-20 21:56:17,241 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-08-20 21:56:32,531 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 25 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 21:56:35,517 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 8250, loss[loss=0.1064, beats_loss=0.009682, ecapa_loss=0.0001213, whisper_loss=0.09547, over 20395.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01044, ecapa_loss=0.0001385, whisper_loss=0.0908, over 3818124.79 frames. ], batch size: 80, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:56:36,271 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 29 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-20 21:56:43,753 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 20 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-20 21:56:49,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4972190.0, ans=0.2 2024-08-20 21:56:56,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4972290.0, ans=0.2 2024-08-20 21:57:27,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4972490.0, ans=0.0 2024-08-20 21:57:29,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4972490.0, ans=0.0 2024-08-20 21:57:34,319 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 27 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-20 21:57:34,657 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.309e+01 2024-08-20 21:57:37,241 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.205e+01 2.481e+01 2.878e+01 4.142e+01, threshold=4.962e+01, percent-clipped=0.0 2024-08-20 21:58:02,899 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 8300, loss[loss=0.1076, beats_loss=0.01045, ecapa_loss=0.0001221, whisper_loss=0.09591, over 18960.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01043, ecapa_loss=0.0001386, whisper_loss=0.09012, over 3804668.27 frames. ], batch size: 73, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:58:23,735 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.85 vs. limit=12.0 2024-08-20 21:58:37,766 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.86 vs. limit=15.0 2024-08-20 21:58:39,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4972890.0, ans=0.025 2024-08-20 21:58:42,144 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 28 from LS+wenet, 34 from Vox, 30 fro AS 2024-08-20 21:59:06,328 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.34 vs. limit=15.0 2024-08-20 21:59:29,684 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 24 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 21:59:31,380 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 8350, loss[loss=0.1078, beats_loss=0.01015, ecapa_loss=0.0001423, whisper_loss=0.09625, over 17241.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01042, ecapa_loss=0.0001393, whisper_loss=0.09029, over 3837426.42 frames. ], batch size: 69, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:59:39,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4973190.0, ans=0.1 2024-08-20 21:59:46,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=4973190.0, ans=15.0 2024-08-20 21:59:51,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4973290.0, ans=0.125 2024-08-20 21:59:56,533 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 28 from LS+wenet, 12 from Vox, 41 fro AS 2024-08-20 21:59:58,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4973290.0, ans=0.2 2024-08-20 22:00:13,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4973390.0, ans=0.07 2024-08-20 22:00:18,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4973390.0, ans=0.125 2024-08-20 22:00:24,790 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-08-20 22:00:24,818 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2024-08-20 22:00:32,041 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 23 from LS+wenet, 19 from Vox, 51 fro AS 2024-08-20 22:00:33,634 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.301e+01 2.484e+01 2.727e+01 5.310e+01, threshold=4.967e+01, percent-clipped=1.0 2024-08-20 22:00:38,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4973490.0, ans=0.125 2024-08-20 22:00:41,967 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.81 vs. limit=12.0 2024-08-20 22:00:59,447 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 8400, loss[loss=0.1065, beats_loss=0.008903, ecapa_loss=0.0001277, whisper_loss=0.09635, over 22868.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01042, ecapa_loss=0.0001391, whisper_loss=0.09022, over 3843041.76 frames. ], batch size: 89, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:01:01,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4973690.0, ans=0.125 2024-08-20 22:01:11,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4973690.0, ans=0.2 2024-08-20 22:01:22,391 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 30 from LS+wenet, 13 from Vox, 44 fro AS 2024-08-20 22:01:38,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4973890.0, ans=0.125 2024-08-20 22:01:54,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4973990.0, ans=0.125 2024-08-20 22:02:08,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4974090.0, ans=0.125 2024-08-20 22:02:24,595 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 18 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-20 22:02:28,169 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 8450, loss[loss=0.09079, beats_loss=0.01085, ecapa_loss=0.0001269, whisper_loss=0.07867, over 18914.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01042, ecapa_loss=0.0001381, whisper_loss=0.08964, over 3796009.72 frames. ], batch size: 75, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:02:28,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4974190.0, ans=0.1 2024-08-20 22:02:34,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4974190.0, ans=0.2 2024-08-20 22:02:47,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4974290.0, ans=0.2 2024-08-20 22:02:48,967 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.54 vs. limit=10.0 2024-08-20 22:02:52,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4974290.0, ans=0.125 2024-08-20 22:03:11,158 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 23 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-20 22:03:21,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4974490.0, ans=0.0 2024-08-20 22:03:24,650 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 23 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-20 22:03:32,165 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.656e+01 2.304e+01 2.514e+01 2.804e+01 1.040e+02, threshold=5.029e+01, percent-clipped=2.0 2024-08-20 22:03:42,881 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.56 vs. limit=15.0 2024-08-20 22:03:59,166 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 8500, loss[loss=0.09845, beats_loss=0.009886, ecapa_loss=0.000146, whisper_loss=0.0871, over 18510.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01043, ecapa_loss=0.000139, whisper_loss=0.0892, over 3770932.89 frames. ], batch size: 75, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:04:01,601 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 35 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 22:04:02,537 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.73 vs. limit=12.0 2024-08-20 22:04:15,432 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=12.0 2024-08-20 22:04:32,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4974790.0, ans=0.0 2024-08-20 22:04:46,531 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 28 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-20 22:04:49,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4974890.0, ans=0.125 2024-08-20 22:05:12,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4975090.0, ans=0.0 2024-08-20 22:05:31,234 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 8550, loss[loss=0.09276, beats_loss=0.01062, ecapa_loss=0.0001445, whisper_loss=0.08069, over 17965.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01041, ecapa_loss=0.0001397, whisper_loss=0.08932, over 3780188.34 frames. ], batch size: 72, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:05:33,061 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 22:05:40,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4975190.0, ans=0.125 2024-08-20 22:05:40,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4975190.0, ans=0.2 2024-08-20 22:05:45,519 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 36 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-20 22:05:48,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4975290.0, ans=0.125 2024-08-20 22:06:07,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4975390.0, ans=0.125 2024-08-20 22:06:09,011 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 29 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-20 22:06:26,729 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 22:06:30,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4975490.0, ans=0.0 2024-08-20 22:06:31,766 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 15 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-20 22:06:33,000 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.271e+01 2.473e+01 2.779e+01 6.630e+01, threshold=4.947e+01, percent-clipped=2.0 2024-08-20 22:06:56,104 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.85 vs. limit=15.0 2024-08-20 22:06:59,608 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 8600, loss[loss=0.08233, beats_loss=0.01289, ecapa_loss=0.0001345, whisper_loss=0.06809, over 22344.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01027, ecapa_loss=0.0001395, whisper_loss=0.09018, over 3763456.72 frames. ], batch size: 94, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:07:46,105 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 17 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 22:07:47,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4975890.0, ans=0.125 2024-08-20 22:08:15,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4976090.0, ans=0.0 2024-08-20 22:08:17,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4976090.0, ans=0.2 2024-08-20 22:08:34,391 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 8650, loss[loss=0.1044, beats_loss=0.009457, ecapa_loss=0.0001417, whisper_loss=0.09354, over 18801.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01029, ecapa_loss=0.0001395, whisper_loss=0.08982, over 3765512.54 frames. ], batch size: 76, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:09:09,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4976390.0, ans=0.125 2024-08-20 22:09:12,998 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 23 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-20 22:09:15,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4976390.0, ans=0.0 2024-08-20 22:09:20,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4976390.0, ans=0.2 2024-08-20 22:09:37,400 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.289e+01 2.459e+01 2.667e+01 5.043e+01, threshold=4.917e+01, percent-clipped=1.0 2024-08-20 22:09:50,553 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 34 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 22:09:54,484 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 17 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-20 22:09:56,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4976590.0, ans=0.125 2024-08-20 22:10:04,900 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 8700, loss[loss=0.09447, beats_loss=0.009608, ecapa_loss=0.0001276, whisper_loss=0.08359, over 17317.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01031, ecapa_loss=0.0001394, whisper_loss=0.08931, over 3762823.51 frames. ], batch size: 68, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:10:13,797 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 26 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 22:10:14,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4976690.0, ans=0.0 2024-08-20 22:10:19,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4976690.0, ans=10.0 2024-08-20 22:10:22,066 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.40 vs. limit=22.5 2024-08-20 22:10:22,735 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 20 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-20 22:10:23,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4976690.0, ans=0.125 2024-08-20 22:10:32,627 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=15.0 2024-08-20 22:10:33,656 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 23 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-20 22:10:53,039 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.30 vs. limit=15.0 2024-08-20 22:11:05,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4976990.0, ans=0.0 2024-08-20 22:11:10,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4976990.0, ans=0.0 2024-08-20 22:11:37,333 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-20 22:11:40,075 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 8750, loss[loss=0.08435, beats_loss=0.01223, ecapa_loss=0.0001275, whisper_loss=0.07085, over 22366.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0103, ecapa_loss=0.0001394, whisper_loss=0.08945, over 3789613.84 frames. ], batch size: 92, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:11:41,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4977190.0, ans=0.125 2024-08-20 22:11:58,032 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-20 22:12:12,085 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 21 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-20 22:12:25,909 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-20 22:12:28,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4977390.0, ans=0.125 2024-08-20 22:12:33,882 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 26 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 22:12:37,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=4977490.0, ans=0.1 2024-08-20 22:12:41,553 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.311e+01 2.545e+01 2.845e+01 5.108e+01, threshold=5.089e+01, percent-clipped=1.0 2024-08-20 22:12:44,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=4977490.0, ans=0.025 2024-08-20 22:12:51,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4977590.0, ans=0.0 2024-08-20 22:12:53,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4977590.0, ans=0.1 2024-08-20 22:12:58,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4977590.0, ans=0.125 2024-08-20 22:12:58,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4977590.0, ans=0.125 2024-08-20 22:13:02,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4977590.0, ans=0.1 2024-08-20 22:13:08,481 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 8800, loss[loss=0.1077, beats_loss=0.009563, ecapa_loss=0.0001146, whisper_loss=0.09696, over 14700.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.0103, ecapa_loss=0.0001384, whisper_loss=0.08914, over 3779555.05 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:13:12,880 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 29 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-20 22:13:16,507 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 15 from LS+wenet, 8 from Vox, 28 fro AS 2024-08-20 22:13:30,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4977790.0, ans=0.125 2024-08-20 22:13:33,556 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-20 22:13:51,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4977890.0, ans=0.125 2024-08-20 22:14:03,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4977990.0, ans=0.125 2024-08-20 22:14:12,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4977990.0, ans=0.0 2024-08-20 22:14:15,451 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 11 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-20 22:14:16,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4977990.0, ans=0.1 2024-08-20 22:14:36,183 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 8850, loss[loss=0.1121, beats_loss=0.009636, ecapa_loss=0.0001063, whisper_loss=0.1014, over 17976.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01036, ecapa_loss=0.0001383, whisper_loss=0.08885, over 3763272.22 frames. ], batch size: 68, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:14:48,016 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.46 vs. limit=22.5 2024-08-20 22:15:00,139 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 22 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-20 22:15:09,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4978290.0, ans=0.125 2024-08-20 22:15:13,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4978390.0, ans=0.05 2024-08-20 22:15:14,150 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 25 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-20 22:15:29,395 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-20 22:15:37,881 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.277e+01 2.445e+01 2.853e+01 5.587e+01, threshold=4.890e+01, percent-clipped=1.0 2024-08-20 22:15:47,316 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 32 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-20 22:15:49,936 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.88 vs. limit=6.0 2024-08-20 22:15:56,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4978590.0, ans=0.0 2024-08-20 22:16:03,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4978690.0, ans=0.0 2024-08-20 22:16:04,083 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 8900, loss[loss=0.1102, beats_loss=0.01018, ecapa_loss=0.0001491, whisper_loss=0.09852, over 16297.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01039, ecapa_loss=0.0001379, whisper_loss=0.08867, over 3776873.89 frames. ], batch size: 67, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:16:28,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4978790.0, ans=0.1 2024-08-20 22:16:35,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4978790.0, ans=0.07 2024-08-20 22:16:49,336 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 20 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-20 22:17:20,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4979090.0, ans=0.0 2024-08-20 22:17:32,256 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 8950, loss[loss=0.1216, beats_loss=0.009351, ecapa_loss=0.0001323, whisper_loss=0.1109, over 22184.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01035, ecapa_loss=0.0001397, whisper_loss=0.08874, over 3754327.65 frames. ], batch size: 83, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:17:42,861 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-20 22:17:53,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4979290.0, ans=0.125 2024-08-20 22:18:12,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4979390.0, ans=0.125 2024-08-20 22:18:19,543 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-20 22:18:32,868 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.269e+01 2.591e+01 2.866e+01 4.170e+01, threshold=5.183e+01, percent-clipped=0.0 2024-08-20 22:18:42,434 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 21 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 22:18:42,807 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.51 vs. limit=12.0 2024-08-20 22:18:44,273 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.967e+00 2024-08-20 22:18:59,286 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 9000, loss[loss=0.09391, beats_loss=0.009254, ecapa_loss=0.0001195, whisper_loss=0.08346, over 17103.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01043, ecapa_loss=0.0001387, whisper_loss=0.08894, over 3801371.05 frames. ], batch size: 65, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:18:59,287 INFO [train_multi_KD3.py:1140] (3/4) Computing validation loss 2024-08-20 22:19:38,137 INFO [train_multi_KD3.py:1150] (3/4) Epoch 34, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.0005128, whisper_loss=0.249, over 931116.00 frames. 2024-08-20 22:20:04,875 INFO [train_multi_KD3.py:1150] (3/4) Epoch 34, validation on SV_voxceleb1: loss=0.003932, beats_loss=0, ecapa_loss=0.0003932, whisper_loss=0, over 944235.00 frames. 2024-08-20 22:20:42,000 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.0362, 2.1609, 2.1528, 2.0137, 2.5314, 2.0960, 2.1459, 2.0230], device='cuda:3') 2024-08-20 22:21:44,512 INFO [train_multi_KD3.py:1150] (3/4) Epoch 34, validation on AT_audioset: loss=0.02294, beats_loss=0.02294, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 22:21:44,516 INFO [train_multi_KD3.py:1156] (3/4) Maximum memory allocated so far is 32606MB 2024-08-20 22:22:07,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4979790.0, ans=0.1 2024-08-20 22:22:34,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4979890.0, ans=0.0 2024-08-20 22:22:36,515 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.49 vs. limit=10.0 2024-08-20 22:22:37,807 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0 2024-08-20 22:22:40,451 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 22:22:42,385 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 26 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-20 22:23:07,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4980090.0, ans=0.0 2024-08-20 22:23:08,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4980090.0, ans=0.1 2024-08-20 22:23:11,225 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 9050, loss[loss=0.1232, beats_loss=0.008043, ecapa_loss=0.0001407, whisper_loss=0.1137, over 17691.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01044, ecapa_loss=0.0001383, whisper_loss=0.0894, over 3786504.45 frames. ], batch size: 69, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:23:12,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4980190.0, ans=0.1 2024-08-20 22:23:12,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4980190.0, ans=0.0 2024-08-20 22:23:21,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4980190.0, ans=0.07 2024-08-20 22:23:22,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4980190.0, ans=0.125 2024-08-20 22:23:33,931 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 22 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-20 22:23:55,079 WARNING [optim.py:496] (3/4) Scaling gradients by 0.043528925627470016, model_norm_threshold=51.82819747924805 2024-08-20 22:23:55,248 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.19, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.654e+05, grad_sumsq=2.654e+05, orig_rms_sq=1.000e+00 2024-08-20 22:23:56,263 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 22:24:07,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4980490.0, ans=0.125 2024-08-20 22:24:11,543 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.252e+01 2.512e+01 2.739e+01 1.191e+03, threshold=5.024e+01, percent-clipped=1.0 2024-08-20 22:24:27,913 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=15.0 2024-08-20 22:24:36,505 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 9100, loss[loss=0.07977, beats_loss=0.01121, ecapa_loss=0.0001436, whisper_loss=0.06712, over 20208.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01048, ecapa_loss=0.0001373, whisper_loss=0.08955, over 3820322.74 frames. ], batch size: 88, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:25:21,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4980890.0, ans=0.1 2024-08-20 22:25:21,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4980890.0, ans=0.125 2024-08-20 22:25:22,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4980890.0, ans=0.0 2024-08-20 22:25:42,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4981090.0, ans=0.0 2024-08-20 22:25:43,603 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 23 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 22:25:44,342 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.43 vs. limit=22.5 2024-08-20 22:25:59,829 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 9150, loss[loss=0.1047, beats_loss=0.009506, ecapa_loss=0.0001416, whisper_loss=0.09376, over 21337.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01058, ecapa_loss=0.0001368, whisper_loss=0.08863, over 3840727.06 frames. ], batch size: 84, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:26:07,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4981190.0, ans=0.125 2024-08-20 22:26:17,184 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.28 vs. limit=22.5 2024-08-20 22:26:49,353 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 22:26:58,646 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.233e+01 2.459e+01 2.675e+01 3.716e+01, threshold=4.917e+01, percent-clipped=0.0 2024-08-20 22:27:20,616 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 24 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-20 22:27:23,686 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 9200, loss[loss=0.1085, beats_loss=0.01099, ecapa_loss=0.0001203, whisper_loss=0.09636, over 23335.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01058, ecapa_loss=0.000137, whisper_loss=0.08924, over 3872854.35 frames. ], batch size: 92, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:27:26,020 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 20 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-20 22:27:33,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4981690.0, ans=0.2 2024-08-20 22:27:34,820 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 18 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-20 22:28:02,958 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 23 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-20 22:28:10,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4981890.0, ans=0.0 2024-08-20 22:28:20,622 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.26 vs. limit=15.0 2024-08-20 22:28:31,320 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 17 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-20 22:28:48,016 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 9250, loss[loss=0.0857, beats_loss=0.01034, ecapa_loss=0.0001933, whisper_loss=0.07343, over 14960.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01048, ecapa_loss=0.0001387, whisper_loss=0.08979, over 3791098.80 frames. ], batch size: 64, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:28:57,843 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.728e-01 2024-08-20 22:29:04,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4982290.0, ans=0.0 2024-08-20 22:29:09,959 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2024-08-20 22:29:21,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4982390.0, ans=0.125 2024-08-20 22:29:27,341 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 22:29:36,774 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 26 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-20 22:29:49,308 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.773e+01 2.258e+01 2.506e+01 2.833e+01 3.659e+01, threshold=5.011e+01, percent-clipped=0.0 2024-08-20 22:30:02,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4982590.0, ans=0.125 2024-08-20 22:30:12,655 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 34 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 22:30:15,783 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 9300, loss[loss=0.09327, beats_loss=0.01346, ecapa_loss=9.553e-05, whisper_loss=0.07885, over 22873.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01051, ecapa_loss=0.0001374, whisper_loss=0.08972, over 3797974.05 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:30:36,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4982790.0, ans=0.95 2024-08-20 22:30:43,703 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.71 vs. limit=15.0 2024-08-20 22:30:47,465 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 20 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-20 22:31:00,791 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 19 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-20 22:31:05,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4982890.0, ans=0.0 2024-08-20 22:31:42,225 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 9350, loss[loss=0.06936, beats_loss=0.01278, ecapa_loss=0.0001914, whisper_loss=0.05467, over 15774.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01051, ecapa_loss=0.000138, whisper_loss=0.08918, over 3779685.04 frames. ], batch size: 72, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:31:53,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4983190.0, ans=0.125 2024-08-20 22:32:19,136 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2024-08-20 22:32:19,742 WARNING [optim.py:496] (3/4) Scaling gradients by 0.011810386553406715, model_norm_threshold=50.11380386352539 2024-08-20 22:32:19,911 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.320e+06, grad_sumsq=2.158e+08, orig_rms_sq=1.075e-02 2024-08-20 22:32:34,942 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 22:32:38,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4983490.0, ans=0.0 2024-08-20 22:32:41,198 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.343e+01 2.545e+01 2.927e+01 4.243e+03, threshold=5.090e+01, percent-clipped=3.0 2024-08-20 22:32:42,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4983490.0, ans=0.125 2024-08-20 22:32:44,306 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=15.34 vs. limit=15.0 2024-08-20 22:32:49,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4983590.0, ans=0.125 2024-08-20 22:33:06,826 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 9400, loss[loss=0.1028, beats_loss=0.01103, ecapa_loss=0.0001407, whisper_loss=0.09032, over 19553.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01049, ecapa_loss=0.0001382, whisper_loss=0.08959, over 3796304.28 frames. ], batch size: 77, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:33:10,477 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 22:33:21,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=4983690.0, ans=0.2 2024-08-20 22:33:22,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4983790.0, ans=0.125 2024-08-20 22:33:53,063 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.44 vs. limit=22.5 2024-08-20 22:33:57,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4983890.0, ans=0.125 2024-08-20 22:34:12,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4983990.0, ans=0.0 2024-08-20 22:34:22,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4984090.0, ans=0.1 2024-08-20 22:34:27,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4984090.0, ans=0.0 2024-08-20 22:34:33,355 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 9450, loss[loss=0.08552, beats_loss=0.01107, ecapa_loss=0.0001171, whisper_loss=0.07328, over 14519.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01039, ecapa_loss=0.00014, whisper_loss=0.09025, over 3812572.84 frames. ], batch size: 58, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:34:37,441 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 21 from LS+wenet, 21 from Vox, 16 fro AS 2024-08-20 22:34:46,501 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.88 vs. limit=6.0 2024-08-20 22:34:54,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4984290.0, ans=0.0 2024-08-20 22:34:59,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4984290.0, ans=0.0 2024-08-20 22:35:31,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4984490.0, ans=0.125 2024-08-20 22:35:32,699 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.335e+01 2.553e+01 2.784e+01 4.072e+01, threshold=5.106e+01, percent-clipped=0.0 2024-08-20 22:35:43,213 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 24 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 22:35:51,072 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 22:35:51,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=4984590.0, ans=0.2 2024-08-20 22:35:57,249 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 20 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-20 22:35:58,728 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 9500, loss[loss=0.08598, beats_loss=0.01294, ecapa_loss=0.0001174, whisper_loss=0.07186, over 22707.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01032, ecapa_loss=0.0001405, whisper_loss=0.08966, over 3803581.36 frames. ], batch size: 91, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:36:02,463 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 21 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 22:36:06,723 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.01 vs. limit=15.0 2024-08-20 22:36:12,783 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 26 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-20 22:36:16,248 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 26 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 22:36:27,828 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 34 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-20 22:37:15,797 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-08-20 22:37:26,223 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 9550, loss[loss=0.1102, beats_loss=0.01069, ecapa_loss=0.0001179, whisper_loss=0.09835, over 18787.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01041, ecapa_loss=0.0001385, whisper_loss=0.09006, over 3811336.89 frames. ], batch size: 71, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:37:39,173 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2024-08-20 22:38:24,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4985490.0, ans=0.125 2024-08-20 22:38:24,977 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.187e+01 2.362e+01 2.605e+01 3.929e+01, threshold=4.725e+01, percent-clipped=0.0 2024-08-20 22:38:47,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4985590.0, ans=0.125 2024-08-20 22:38:50,482 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 16 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-20 22:38:51,967 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 9600, loss[loss=0.08866, beats_loss=0.01127, ecapa_loss=0.000114, whisper_loss=0.07625, over 14821.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01036, ecapa_loss=0.0001393, whisper_loss=0.08969, over 3779738.49 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:38:52,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4985690.0, ans=0.1 2024-08-20 22:39:07,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4985690.0, ans=0.0 2024-08-20 22:39:10,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4985790.0, ans=0.2 2024-08-20 22:39:24,565 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.45 vs. limit=22.5 2024-08-20 22:39:28,457 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 29 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-20 22:40:07,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4986090.0, ans=0.0 2024-08-20 22:40:11,790 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 25 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-20 22:40:21,237 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.67 vs. limit=15.0 2024-08-20 22:40:21,437 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 9650, loss[loss=0.1325, beats_loss=0.008676, ecapa_loss=0.0001759, whisper_loss=0.1221, over 22189.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01031, ecapa_loss=0.0001392, whisper_loss=0.0902, over 3821585.28 frames. ], batch size: 89, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:40:41,201 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.59 vs. limit=15.0 2024-08-20 22:40:56,053 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 15 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 22:41:03,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4986390.0, ans=0.0 2024-08-20 22:41:15,502 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 20 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-20 22:41:20,600 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 31 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 22:41:22,169 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.643e+01 2.288e+01 2.415e+01 2.707e+01 3.907e+01, threshold=4.829e+01, percent-clipped=0.0 2024-08-20 22:41:28,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4986490.0, ans=0.0 2024-08-20 22:41:36,620 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 25 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-20 22:41:42,770 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 26 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 22:41:48,632 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 9700, loss[loss=0.0705, beats_loss=0.01175, ecapa_loss=0.0001504, whisper_loss=0.05725, over 17735.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01035, ecapa_loss=0.0001399, whisper_loss=0.09007, over 3852334.56 frames. ], batch size: 75, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:42:10,112 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 15 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-20 22:42:16,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4986790.0, ans=0.125 2024-08-20 22:42:32,260 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 37 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-20 22:43:15,064 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 9750, loss[loss=0.06932, beats_loss=0.01224, ecapa_loss=0.00013, whisper_loss=0.05578, over 21606.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01036, ecapa_loss=0.0001402, whisper_loss=0.08958, over 3804169.66 frames. ], batch size: 95, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:43:17,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4987190.0, ans=0.04949747468305833 2024-08-20 22:43:29,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4987190.0, ans=0.125 2024-08-20 22:43:30,878 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 15 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-20 22:43:47,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4987290.0, ans=0.07 2024-08-20 22:43:50,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4987390.0, ans=0.1 2024-08-20 22:44:03,495 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 21 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-20 22:44:05,428 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-20 22:44:15,607 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.763e+01 2.197e+01 2.422e+01 2.722e+01 3.881e+01, threshold=4.844e+01, percent-clipped=0.0 2024-08-20 22:44:18,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4987490.0, ans=0.0 2024-08-20 22:44:22,179 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.12 vs. limit=15.0 2024-08-20 22:44:28,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4987590.0, ans=0.2 2024-08-20 22:44:31,594 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 31 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 22:44:41,305 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 9800, loss[loss=0.09081, beats_loss=0.01354, ecapa_loss=0.0001245, whisper_loss=0.07602, over 18843.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.0104, ecapa_loss=0.0001397, whisper_loss=0.08903, over 3795843.92 frames. ], batch size: 77, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:45:05,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4987790.0, ans=0.125 2024-08-20 22:45:13,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4987890.0, ans=0.125 2024-08-20 22:45:17,694 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.87 vs. limit=15.0 2024-08-20 22:45:33,967 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 24 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-20 22:45:36,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4987990.0, ans=0.125 2024-08-20 22:45:52,024 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-20 22:45:59,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4988090.0, ans=0.1 2024-08-20 22:46:04,170 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 22 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-20 22:46:06,832 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 9850, loss[loss=0.09469, beats_loss=0.009405, ecapa_loss=0.0001363, whisper_loss=0.08392, over 13735.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01035, ecapa_loss=0.0001397, whisper_loss=0.08947, over 3786996.88 frames. ], batch size: 53, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:46:07,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4988190.0, ans=0.0 2024-08-20 22:46:13,222 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 26 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 22:46:27,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4988290.0, ans=0.0 2024-08-20 22:46:30,228 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 18 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-20 22:46:33,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4988290.0, ans=0.0 2024-08-20 22:47:04,805 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.065e-02 2024-08-20 22:47:06,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4988490.0, ans=0.125 2024-08-20 22:47:07,546 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.349e+01 2.590e+01 2.982e+01 4.203e+01, threshold=5.180e+01, percent-clipped=0.0 2024-08-20 22:47:10,739 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 23 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-20 22:47:34,505 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 9900, loss[loss=0.1072, beats_loss=0.01004, ecapa_loss=0.000155, whisper_loss=0.09564, over 15397.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01047, ecapa_loss=0.0001394, whisper_loss=0.08901, over 3797877.36 frames. ], batch size: 64, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:47:45,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4988690.0, ans=10.0 2024-08-20 22:47:46,839 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 25 from LS+wenet, 35 from Vox, 32 fro AS 2024-08-20 22:47:54,892 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.82 vs. limit=10.0 2024-08-20 22:47:56,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4988790.0, ans=0.2 2024-08-20 22:47:57,467 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 27 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 22:48:01,329 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 19 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-20 22:48:12,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4988890.0, ans=0.0 2024-08-20 22:48:17,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=4988890.0, ans=0.2 2024-08-20 22:48:19,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.94 vs. limit=15.0 2024-08-20 22:48:35,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4988990.0, ans=0.125 2024-08-20 22:48:38,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4988990.0, ans=0.125 2024-08-20 22:48:41,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4988990.0, ans=0.05 2024-08-20 22:48:46,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4989090.0, ans=0.2 2024-08-20 22:48:46,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4989090.0, ans=0.1 2024-08-20 22:48:53,900 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.81 vs. limit=22.5 2024-08-20 22:49:01,536 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 9950, loss[loss=0.09697, beats_loss=0.01081, ecapa_loss=0.0001382, whisper_loss=0.08478, over 15626.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0104, ecapa_loss=0.0001401, whisper_loss=0.08913, over 3774402.90 frames. ], batch size: 63, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:49:15,162 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.906e+05 2024-08-20 22:49:25,917 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 25 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-20 22:49:40,497 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.48 vs. limit=15.0 2024-08-20 22:49:41,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4989390.0, ans=0.1 2024-08-20 22:49:55,278 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.52 vs. limit=15.0 2024-08-20 22:50:02,139 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.629e+01 2.261e+01 2.497e+01 2.811e+01 6.221e+01, threshold=4.994e+01, percent-clipped=1.0 2024-08-20 22:50:26,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4989590.0, ans=0.125 2024-08-20 22:50:28,713 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 10000, loss[loss=0.09069, beats_loss=0.01425, ecapa_loss=0.0001024, whisper_loss=0.07542, over 23216.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01047, ecapa_loss=0.0001393, whisper_loss=0.08944, over 3794195.38 frames. ], batch size: 93, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:50:30,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4989690.0, ans=0.125 2024-08-20 22:50:58,673 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 17 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-20 22:51:14,671 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 29 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-20 22:51:20,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4989890.0, ans=0.0 2024-08-20 22:51:28,046 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2024-08-20 22:51:37,213 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.36 vs. limit=5.0 2024-08-20 22:51:52,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4990090.0, ans=0.0 2024-08-20 22:51:53,815 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 26 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 22:51:58,425 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 10050, loss[loss=0.09428, beats_loss=0.009851, ecapa_loss=0.0001333, whisper_loss=0.0831, over 23083.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01048, ecapa_loss=0.0001388, whisper_loss=0.08912, over 3772017.12 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 22:52:00,434 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 22:52:06,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4990190.0, ans=0.125 2024-08-20 22:52:08,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4990190.0, ans=0.125 2024-08-20 22:52:11,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4990190.0, ans=0.125 2024-08-20 22:52:12,140 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2024-08-20 22:52:12,449 WARNING [optim.py:496] (3/4) Scaling gradients by 0.07415100187063217, model_norm_threshold=49.94480514526367 2024-08-20 22:52:12,620 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.37, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.662e+05, grad_sumsq=1.662e+05, orig_rms_sq=1.000e+00 2024-08-20 22:52:45,773 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 17 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-20 22:52:58,573 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.57 vs. limit=15.0 2024-08-20 22:52:59,389 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 24 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-20 22:53:00,360 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.315e+01 2.541e+01 2.876e+01 6.736e+02, threshold=5.082e+01, percent-clipped=1.0 2024-08-20 22:53:11,462 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 22 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 22:53:23,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4990590.0, ans=0.1 2024-08-20 22:53:28,276 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 10100, loss[loss=0.1032, beats_loss=0.009935, ecapa_loss=0.0001191, whisper_loss=0.0921, over 18005.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01039, ecapa_loss=0.0001397, whisper_loss=0.08963, over 3778430.86 frames. ], batch size: 71, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 22:53:35,244 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-20 22:53:45,124 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.97 vs. limit=15.0 2024-08-20 22:54:13,822 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 24 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 22:54:17,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4990890.0, ans=0.0 2024-08-20 22:54:21,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4990990.0, ans=0.125 2024-08-20 22:54:33,180 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 25 from LS+wenet, 19 from Vox, 51 fro AS 2024-08-20 22:54:37,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4991090.0, ans=0.125 2024-08-20 22:54:40,044 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 11 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-20 22:54:44,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4991090.0, ans=0.125 2024-08-20 22:54:54,998 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 10150, loss[loss=0.1149, beats_loss=0.008904, ecapa_loss=0.000121, whisper_loss=0.1048, over 17141.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01038, ecapa_loss=0.0001406, whisper_loss=0.09018, over 3777179.25 frames. ], batch size: 64, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 22:55:02,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4991190.0, ans=0.0 2024-08-20 22:55:03,084 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.89 vs. limit=15.0 2024-08-20 22:55:16,297 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 20 from LS+wenet, 16 from Vox, 14 fro AS 2024-08-20 22:55:18,350 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 16 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-20 22:55:23,133 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 19 from LS+wenet, 33 from Vox, 38 fro AS 2024-08-20 22:55:56,415 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.245e+01 2.529e+01 2.822e+01 1.463e+02, threshold=5.058e+01, percent-clipped=1.0 2024-08-20 22:55:57,457 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.57 vs. limit=15.0 2024-08-20 22:56:11,654 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.88 vs. limit=15.0 2024-08-20 22:56:17,600 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 22 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 22:56:20,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4991590.0, ans=0.125 2024-08-20 22:56:22,413 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 10200, loss[loss=0.09838, beats_loss=0.007727, ecapa_loss=0.0001565, whisper_loss=0.08909, over 16762.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01037, ecapa_loss=0.0001403, whisper_loss=0.09019, over 3776762.01 frames. ], batch size: 63, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 22:56:32,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4991690.0, ans=0.125 2024-08-20 22:56:42,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4991790.0, ans=0.0 2024-08-20 22:56:43,451 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 20 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-20 22:57:54,899 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 10250, loss[loss=0.09751, beats_loss=0.01253, ecapa_loss=0.0001218, whisper_loss=0.08375, over 18393.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01042, ecapa_loss=0.0001396, whisper_loss=0.08969, over 3808465.98 frames. ], batch size: 74, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 22:58:00,909 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 14 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-20 22:58:08,253 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-20 22:58:40,239 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 20 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-20 22:58:40,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4992390.0, ans=0.1 2024-08-20 22:58:44,413 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-08-20 22:58:49,310 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 15 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-20 22:58:52,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4992490.0, ans=0.2 2024-08-20 22:58:56,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4992490.0, ans=0.2 2024-08-20 22:58:59,599 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.301e+01 2.546e+01 2.793e+01 3.961e+01, threshold=5.092e+01, percent-clipped=0.0 2024-08-20 22:59:26,746 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 10300, loss[loss=0.1125, beats_loss=0.01076, ecapa_loss=0.0001408, whisper_loss=0.1003, over 23213.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01037, ecapa_loss=0.0001402, whisper_loss=0.09073, over 3823943.36 frames. ], batch size: 94, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 22:59:32,358 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.66 vs. limit=12.0 2024-08-20 22:59:51,140 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 27 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-20 22:59:52,884 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 22 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-20 22:59:58,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4992790.0, ans=0.2 2024-08-20 23:00:04,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4992890.0, ans=0.125 2024-08-20 23:00:06,737 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 22 from LS+wenet, 24 from Vox, 18 fro AS 2024-08-20 23:00:24,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4992990.0, ans=0.0 2024-08-20 23:00:26,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4992990.0, ans=0.0 2024-08-20 23:00:41,085 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-20 23:00:58,098 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 10350, loss[loss=0.09506, beats_loss=0.01211, ecapa_loss=0.0001236, whisper_loss=0.08172, over 18096.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01045, ecapa_loss=0.0001402, whisper_loss=0.09007, over 3837906.75 frames. ], batch size: 74, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:00:59,237 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-08-20 23:01:04,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4993190.0, ans=0.125 2024-08-20 23:01:42,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4993390.0, ans=0.125 2024-08-20 23:01:44,034 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 34 from LS+wenet, 32 from Vox, 28 fro AS 2024-08-20 23:01:45,977 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-20 23:01:51,917 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.85 vs. limit=12.0 2024-08-20 23:01:55,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4993490.0, ans=0.125 2024-08-20 23:02:01,078 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.402e+01 2.644e+01 3.028e+01 6.335e+01, threshold=5.289e+01, percent-clipped=1.0 2024-08-20 23:02:01,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4993490.0, ans=0.0 2024-08-20 23:02:21,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4993590.0, ans=0.125 2024-08-20 23:02:26,978 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2024-08-20 23:02:27,196 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.33 vs. limit=10.0 2024-08-20 23:02:29,171 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 10400, loss[loss=0.09321, beats_loss=0.01288, ecapa_loss=9.279e-05, whisper_loss=0.07939, over 18654.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01039, ecapa_loss=0.0001413, whisper_loss=0.09025, over 3822039.72 frames. ], batch size: 71, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:02:37,799 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 29 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-20 23:02:39,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4993690.0, ans=0.125 2024-08-20 23:02:41,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=4993690.0, ans=0.5 2024-08-20 23:02:59,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4993790.0, ans=0.2 2024-08-20 23:03:09,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4993890.0, ans=0.0 2024-08-20 23:03:23,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4993990.0, ans=0.0 2024-08-20 23:03:26,755 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 21 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-20 23:03:31,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4993990.0, ans=0.125 2024-08-20 23:03:35,527 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 19 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-20 23:03:39,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4993990.0, ans=0.125 2024-08-20 23:03:50,994 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.75 vs. limit=15.0 2024-08-20 23:03:59,835 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 10450, loss[loss=0.1082, beats_loss=0.007341, ecapa_loss=0.0001723, whisper_loss=0.0991, over 15909.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0104, ecapa_loss=0.0001412, whisper_loss=0.09028, over 3809533.10 frames. ], batch size: 67, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:04:09,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4994190.0, ans=0.0 2024-08-20 23:04:26,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4994290.0, ans=0.1 2024-08-20 23:04:35,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4994390.0, ans=0.0 2024-08-20 23:04:42,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4994390.0, ans=0.125 2024-08-20 23:05:01,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4994490.0, ans=0.125 2024-08-20 23:05:04,038 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.282e+01 2.461e+01 2.662e+01 8.138e+01, threshold=4.922e+01, percent-clipped=1.0 2024-08-20 23:05:11,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4994490.0, ans=0.1 2024-08-20 23:05:31,037 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 10500, loss[loss=0.1051, beats_loss=0.01105, ecapa_loss=0.0001291, whisper_loss=0.09275, over 23210.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01035, ecapa_loss=0.0001401, whisper_loss=0.09085, over 3815300.96 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:05:50,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4994790.0, ans=0.1 2024-08-20 23:05:59,927 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 18 from LS+wenet, 29 from Vox, 19 fro AS 2024-08-20 23:06:09,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4994890.0, ans=0.125 2024-08-20 23:06:32,218 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.06 vs. limit=15.0 2024-08-20 23:06:32,862 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 23:06:54,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4995090.0, ans=0.0 2024-08-20 23:06:58,690 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 10550, loss[loss=0.09323, beats_loss=0.009578, ecapa_loss=0.0001908, whisper_loss=0.08174, over 13350.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01038, ecapa_loss=0.0001401, whisper_loss=0.09039, over 3840155.34 frames. ], batch size: 55, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:07:13,545 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.17 vs. limit=15.0 2024-08-20 23:07:18,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4995290.0, ans=0.125 2024-08-20 23:07:21,871 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 32 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-20 23:07:25,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4995290.0, ans=0.0 2024-08-20 23:07:37,498 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2024-08-20 23:07:39,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4995390.0, ans=0.025 2024-08-20 23:07:47,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4995390.0, ans=0.015 2024-08-20 23:07:59,155 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.287e+01 2.503e+01 2.760e+01 4.390e+01, threshold=5.007e+01, percent-clipped=0.0 2024-08-20 23:07:59,917 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 19 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-20 23:08:01,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4995490.0, ans=0.125 2024-08-20 23:08:10,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4995590.0, ans=0.2 2024-08-20 23:08:14,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4995590.0, ans=0.0 2024-08-20 23:08:20,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4995590.0, ans=0.1 2024-08-20 23:08:25,602 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 10600, loss[loss=0.1105, beats_loss=0.008659, ecapa_loss=0.0001425, whisper_loss=0.1004, over 20142.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01036, ecapa_loss=0.0001402, whisper_loss=0.08997, over 3840117.51 frames. ], batch size: 79, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:08:40,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4995690.0, ans=0.1 2024-08-20 23:08:44,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4995790.0, ans=0.07 2024-08-20 23:08:48,891 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 27 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 23:08:56,409 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 23 from LS+wenet, 19 from Vox, 51 fro AS 2024-08-20 23:09:00,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4995790.0, ans=0.1 2024-08-20 23:09:05,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4995890.0, ans=0.125 2024-08-20 23:09:08,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4995890.0, ans=0.2 2024-08-20 23:09:09,006 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 13 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 23:09:17,482 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 26 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-20 23:09:17,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4995890.0, ans=0.0 2024-08-20 23:09:54,758 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=15.0 2024-08-20 23:09:59,563 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 10650, loss[loss=0.1043, beats_loss=0.009413, ecapa_loss=0.0001332, whisper_loss=0.09355, over 18365.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01038, ecapa_loss=0.0001391, whisper_loss=0.08958, over 3823712.68 frames. ], batch size: 73, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:10:05,523 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 22 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-20 23:10:07,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4996190.0, ans=0.125 2024-08-20 23:10:25,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4996290.0, ans=0.125 2024-08-20 23:11:06,413 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.258e+01 2.484e+01 2.690e+01 6.351e+01, threshold=4.969e+01, percent-clipped=1.0 2024-08-20 23:11:24,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4996590.0, ans=0.2 2024-08-20 23:11:33,155 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 10700, loss[loss=0.06124, beats_loss=0.01434, ecapa_loss=0.0001016, whisper_loss=0.04588, over 13357.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01041, ecapa_loss=0.0001382, whisper_loss=0.08965, over 3795448.46 frames. ], batch size: 53, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:11:39,147 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 21 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-20 23:12:04,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4996790.0, ans=0.1 2024-08-20 23:12:11,933 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 24 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 23:12:18,981 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 34 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 23:12:32,387 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 23:12:35,558 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 23:12:43,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4996990.0, ans=0.125 2024-08-20 23:12:46,533 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 18 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-20 23:12:52,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4997090.0, ans=0.0 2024-08-20 23:13:03,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4997090.0, ans=0.125 2024-08-20 23:13:06,188 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 10750, loss[loss=0.09325, beats_loss=0.008936, ecapa_loss=0.0001938, whisper_loss=0.08237, over 20048.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01047, ecapa_loss=0.0001369, whisper_loss=0.08946, over 3778164.17 frames. ], batch size: 88, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:13:47,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4997390.0, ans=0.125 2024-08-20 23:13:55,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4997390.0, ans=0.1 2024-08-20 23:14:03,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4997390.0, ans=0.95 2024-08-20 23:14:05,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4997390.0, ans=0.125 2024-08-20 23:14:16,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4997490.0, ans=0.1 2024-08-20 23:14:18,684 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.299e+01 2.568e+01 2.833e+01 1.794e+02, threshold=5.137e+01, percent-clipped=1.0 2024-08-20 23:14:20,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4997490.0, ans=10.0 2024-08-20 23:14:33,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4997590.0, ans=0.125 2024-08-20 23:14:42,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4997590.0, ans=0.125 2024-08-20 23:14:46,996 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 10800, loss[loss=0.09191, beats_loss=0.007509, ecapa_loss=0.0001665, whisper_loss=0.08273, over 16703.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01059, ecapa_loss=0.0001368, whisper_loss=0.08918, over 3801482.38 frames. ], batch size: 66, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:15:10,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4997790.0, ans=0.0 2024-08-20 23:15:17,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4997790.0, ans=0.125 2024-08-20 23:15:23,722 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 37 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 23:15:35,218 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.54 vs. limit=15.0 2024-08-20 23:16:03,038 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 16 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-20 23:16:04,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4998090.0, ans=0.0 2024-08-20 23:16:10,767 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.61 vs. limit=6.0 2024-08-20 23:16:13,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4998090.0, ans=0.1 2024-08-20 23:16:20,528 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 10850, loss[loss=0.09462, beats_loss=0.009723, ecapa_loss=0.0001674, whisper_loss=0.08322, over 20947.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01045, ecapa_loss=0.0001369, whisper_loss=0.0907, over 3826997.75 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:16:46,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4998290.0, ans=0.2 2024-08-20 23:16:56,804 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 30 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-20 23:17:05,823 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 22 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-20 23:17:15,409 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 25 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-20 23:17:21,813 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2024-08-20 23:17:24,046 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.245e+01 2.613e+01 2.967e+01 9.176e+01, threshold=5.227e+01, percent-clipped=1.0 2024-08-20 23:17:27,189 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.96 vs. limit=6.0 2024-08-20 23:17:28,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4998490.0, ans=0.1 2024-08-20 23:17:34,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4998590.0, ans=0.125 2024-08-20 23:17:45,727 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 16 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 23:17:48,510 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2024-08-20 23:17:51,202 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 10900, loss[loss=0.09874, beats_loss=0.01201, ecapa_loss=0.0001441, whisper_loss=0.08529, over 21801.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01038, ecapa_loss=0.0001377, whisper_loss=0.09061, over 3801856.28 frames. ], batch size: 92, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:17:57,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4998690.0, ans=0.1 2024-08-20 23:17:57,675 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.26 vs. limit=12.0 2024-08-20 23:18:03,894 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 25 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-20 23:18:06,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4998690.0, ans=0.125 2024-08-20 23:18:06,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4998690.0, ans=0.0 2024-08-20 23:18:13,079 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 21 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-20 23:18:44,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4998990.0, ans=0.07 2024-08-20 23:18:54,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4998990.0, ans=0.2 2024-08-20 23:18:54,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4998990.0, ans=0.0 2024-08-20 23:19:05,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4999090.0, ans=10.0 2024-08-20 23:19:18,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4999090.0, ans=0.0 2024-08-20 23:19:19,237 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 16 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 23:19:21,347 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 10950, loss[loss=0.08387, beats_loss=0.01317, ecapa_loss=0.0001254, whisper_loss=0.06945, over 17105.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01028, ecapa_loss=0.0001392, whisper_loss=0.09131, over 3786727.53 frames. ], batch size: 69, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:19:36,609 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-08-20 23:19:38,705 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.75 vs. limit=6.0 2024-08-20 23:20:25,227 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.300e+01 2.546e+01 2.869e+01 3.848e+01, threshold=5.092e+01, percent-clipped=0.0 2024-08-20 23:20:45,224 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-08-20 23:20:46,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4999590.0, ans=0.125 2024-08-20 23:20:47,372 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 20 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 23:20:52,493 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 11000, loss[loss=0.1147, beats_loss=0.00842, ecapa_loss=0.000127, whisper_loss=0.105, over 17939.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01032, ecapa_loss=0.000138, whisper_loss=0.09134, over 3797134.94 frames. ], batch size: 70, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:21:11,663 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 27 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-20 23:21:26,816 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.84 vs. limit=22.5 2024-08-20 23:21:31,810 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 35 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 23:21:32,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4999890.0, ans=0.0 2024-08-20 23:22:09,075 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 25 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-20 23:22:27,405 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 11050, loss[loss=0.1127, beats_loss=0.00887, ecapa_loss=0.0001319, whisper_loss=0.1026, over 23137.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0103, ecapa_loss=0.0001393, whisper_loss=0.09117, over 3811482.03 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:22:33,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=5000190.0, ans=0.125 2024-08-20 23:22:44,653 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 16 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-20 23:22:56,513 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.26 vs. limit=10.0 2024-08-20 23:23:32,973 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.245e+01 2.536e+01 2.821e+01 3.723e+01, threshold=5.072e+01, percent-clipped=0.0 2024-08-20 23:23:50,210 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.99 vs. limit=10.0 2024-08-20 23:24:01,822 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 11100, loss[loss=0.1191, beats_loss=0.009115, ecapa_loss=0.0001236, whisper_loss=0.1087, over 15386.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01032, ecapa_loss=0.00014, whisper_loss=0.09078, over 3812108.11 frames. ], batch size: 59, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:24:12,168 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 14 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 23:24:31,051 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 25 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-20 23:25:10,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5000990.0, ans=0.125 2024-08-20 23:25:13,396 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 22 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-20 23:25:13,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5000990.0, ans=0.0 2024-08-20 23:25:23,261 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.432e-03 2024-08-20 23:25:40,220 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 11150, loss[loss=0.09363, beats_loss=0.01096, ecapa_loss=0.0001514, whisper_loss=0.08115, over 21071.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01037, ecapa_loss=0.0001397, whisper_loss=0.09038, over 3842292.06 frames. ], batch size: 91, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:26:44,403 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.389e+01 2.664e+01 3.018e+01 8.039e+01, threshold=5.328e+01, percent-clipped=1.0 2024-08-20 23:26:47,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=5001490.0, ans=0.0 2024-08-20 23:26:50,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5001490.0, ans=0.125 2024-08-20 23:26:58,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5001590.0, ans=0.125 2024-08-20 23:27:01,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=5001590.0, ans=0.5 2024-08-20 23:27:14,925 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 11200, loss[loss=0.09629, beats_loss=0.0112, ecapa_loss=0.0001829, whisper_loss=0.08327, over 18314.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01044, ecapa_loss=0.0001395, whisper_loss=0.09002, over 3872604.53 frames. ], batch size: 79, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:27:46,417 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 27 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-20 23:28:01,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5001890.0, ans=0.1 2024-08-20 23:28:02,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=5001890.0, ans=0.125 2024-08-20 23:28:03,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=5001890.0, ans=0.125 2024-08-20 23:28:08,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5001890.0, ans=0.125 2024-08-20 23:28:15,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5001990.0, ans=0.1 2024-08-20 23:28:32,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=5002090.0, ans=0.05 2024-08-20 23:28:37,911 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.78 vs. limit=15.0 2024-08-20 23:28:39,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5002090.0, ans=0.125 2024-08-20 23:28:47,409 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 11250, loss[loss=0.09368, beats_loss=0.01261, ecapa_loss=0.0001189, whisper_loss=0.07988, over 22573.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.0001398, whisper_loss=0.09046, over 3854860.48 frames. ], batch size: 93, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:28:55,854 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 24 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-20 23:28:56,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=5002190.0, ans=0.0 2024-08-20 23:29:45,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=5002490.0, ans=0.2 2024-08-20 23:29:54,123 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.254e+01 2.512e+01 2.929e+01 3.894e+01, threshold=5.023e+01, percent-clipped=0.0 2024-08-20 23:29:59,326 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-08-20 23:30:05,063 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.34 vs. limit=22.5 2024-08-20 23:30:08,954 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.12 vs. limit=15.0 2024-08-20 23:30:14,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=5002590.0, ans=0.015 2024-08-20 23:30:22,470 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 11300, loss[loss=0.09469, beats_loss=0.008144, ecapa_loss=0.0001466, whisper_loss=0.08508, over 16405.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01044, ecapa_loss=0.0001395, whisper_loss=0.0901, over 3868922.35 frames. ], batch size: 62, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:30:37,788 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 17 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-20 23:31:04,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5002890.0, ans=0.125 2024-08-20 23:31:55,071 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 26 from LS+wenet, 15 from Vox, 53 fro AS 2024-08-20 23:32:09,477 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 11350, loss[loss=0.09521, beats_loss=0.01007, ecapa_loss=0.0001031, whisper_loss=0.08411, over 14466.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01043, ecapa_loss=0.0001395, whisper_loss=0.08988, over 3854914.25 frames. ], batch size: 53, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:32:14,226 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.13 vs. limit=12.0 2024-08-20 23:32:17,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=5003190.0, ans=0.95 2024-08-20 23:32:19,869 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=20.77 vs. limit=22.5 2024-08-20 23:32:23,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=5003190.0, ans=0.2 2024-08-20 23:32:34,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5003290.0, ans=0.0 2024-08-20 23:32:57,217 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 26 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-20 23:33:16,382 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.703e+01 2.267e+01 2.559e+01 2.926e+01 1.468e+02, threshold=5.117e+01, percent-clipped=1.0 2024-08-20 23:33:35,193 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 19 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 23:33:40,173 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 24 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-20 23:33:43,994 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 11400, loss[loss=0.09077, beats_loss=0.009442, ecapa_loss=0.0001474, whisper_loss=0.07985, over 15176.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01029, ecapa_loss=0.0001386, whisper_loss=0.09075, over 3839207.50 frames. ], batch size: 57, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:33:44,982 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=15.0 2024-08-20 23:33:50,056 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.30 vs. limit=15.0 2024-08-20 23:34:07,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=5003790.0, ans=0.125 2024-08-20 23:34:14,455 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.28 vs. limit=15.0 2024-08-20 23:34:15,183 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 31 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-20 23:34:42,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=5003990.0, ans=0.125 2024-08-20 23:35:16,172 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 11450, loss[loss=0.09896, beats_loss=0.009174, ecapa_loss=0.0001925, whisper_loss=0.08787, over 20575.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01014, ecapa_loss=0.0001401, whisper_loss=0.09138, over 3802603.08 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:35:23,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=5004190.0, ans=0.1 2024-08-20 23:35:30,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=5004190.0, ans=0.125 2024-08-20 23:35:38,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5004290.0, ans=0.125 2024-08-20 23:35:57,774 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 19 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-20 23:36:09,546 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.25 vs. limit=22.5 2024-08-20 23:36:10,336 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 17 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-20 23:36:20,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=5004490.0, ans=0.125 2024-08-20 23:36:31,464 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.263e+01 2.473e+01 2.920e+01 3.744e+01, threshold=4.946e+01, percent-clipped=0.0 2024-08-20 23:36:55,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=5004590.0, ans=0.2 2024-08-20 23:36:59,828 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 11500, loss[loss=0.09722, beats_loss=0.0116, ecapa_loss=0.0001261, whisper_loss=0.08436, over 14329.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01016, ecapa_loss=0.0001408, whisper_loss=0.09166, over 3809853.94 frames. ], batch size: 59, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:37:10,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=5004690.0, ans=0.2 2024-08-20 23:37:13,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=5004690.0, ans=0.0 2024-08-20 23:37:43,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=5004890.0, ans=10.0 2024-08-20 23:37:51,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=5004890.0, ans=0.125 2024-08-20 23:38:11,452 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 23:38:19,938 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 24 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-20 23:38:38,282 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 11550, loss[loss=0.1064, beats_loss=0.00916, ecapa_loss=0.0001159, whisper_loss=0.09608, over 21018.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01018, ecapa_loss=0.0001394, whisper_loss=0.09156, over 3802880.46 frames. ], batch size: 81, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:38:48,362 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 29 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-20 23:38:56,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5005290.0, ans=0.125 2024-08-20 23:39:02,305 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 29 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-20 23:39:05,984 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 20 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-20 23:39:12,362 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.85 vs. limit=15.0 2024-08-20 23:39:17,983 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 23:39:20,919 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 21 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-20 23:39:39,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5005490.0, ans=0.1 2024-08-20 23:39:39,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=5005490.0, ans=0.125 2024-08-20 23:39:45,667 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.283e+01 2.508e+01 2.789e+01 4.307e+01, threshold=5.015e+01, percent-clipped=0.0 2024-08-20 23:39:53,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=5005590.0, ans=0.05 2024-08-20 23:40:16,513 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 11600, loss[loss=0.1204, beats_loss=0.008693, ecapa_loss=0.0001706, whisper_loss=0.11, over 22850.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01029, ecapa_loss=0.0001376, whisper_loss=0.09136, over 3824380.71 frames. ], batch size: 91, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:40:17,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=5005690.0, ans=10.0 2024-08-20 23:40:19,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5005690.0, ans=0.1 2024-08-20 23:40:20,840 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 34 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-20 23:40:37,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5005790.0, ans=0.125 2024-08-20 23:40:57,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5005890.0, ans=0.125 2024-08-20 23:41:28,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=5005990.0, ans=0.09899494936611666 2024-08-20 23:41:34,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=5006090.0, ans=0.125 2024-08-20 23:41:44,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=5006090.0, ans=0.04949747468305833 2024-08-20 23:41:50,157 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 11650, loss[loss=0.1174, beats_loss=0.009793, ecapa_loss=0.0001538, whisper_loss=0.106, over 22465.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01027, ecapa_loss=0.0001384, whisper_loss=0.09185, over 3824698.53 frames. ], batch size: 93, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:42:31,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=5006390.0, ans=0.0 2024-08-20 23:42:40,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5006390.0, ans=0.125 2024-08-20 23:42:42,230 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 26 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-20 23:42:46,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5006490.0, ans=0.0 2024-08-20 23:42:53,586 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.472e+01 2.706e+01 2.968e+01 4.797e+01, threshold=5.413e+01, percent-clipped=0.0 2024-08-20 23:42:56,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5006490.0, ans=0.125 2024-08-20 23:43:11,982 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.17 vs. limit=15.0 2024-08-20 23:43:21,206 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 11700, loss[loss=0.0833, beats_loss=0.01186, ecapa_loss=0.0001517, whisper_loss=0.06992, over 17536.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01025, ecapa_loss=0.00014, whisper_loss=0.09154, over 3823502.56 frames. ], batch size: 75, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:43:36,105 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 23:43:41,858 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-20 23:43:46,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=5006790.0, ans=0.2 2024-08-20 23:44:09,132 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.63 vs. limit=22.5 2024-08-20 23:44:17,727 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 16 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 23:44:18,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5006990.0, ans=0.1 2024-08-20 23:44:22,101 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.09 vs. limit=15.0 2024-08-20 23:44:38,973 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 27 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-20 23:44:57,131 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 11750, loss[loss=0.09902, beats_loss=0.01063, ecapa_loss=0.000122, whisper_loss=0.08717, over 14424.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01025, ecapa_loss=0.0001399, whisper_loss=0.09181, over 3859436.03 frames. ], batch size: 55, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:45:35,602 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08465281873941422, model_norm_threshold=54.12553405761719 2024-08-20 23:45:35,772 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.1.norm.log_scale with proportion 0.07, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.880e+04, grad_sumsq=2.880e+04, orig_rms_sq=1.000e+00 2024-08-20 23:45:51,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=5007490.0, ans=0.05 2024-08-20 23:45:51,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=5007490.0, ans=0.125 2024-08-20 23:45:56,251 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-20 23:46:01,517 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.667e+01 2.339e+01 2.549e+01 2.970e+01 6.394e+02, threshold=5.099e+01, percent-clipped=1.0 2024-08-20 23:46:14,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5007590.0, ans=0.1 2024-08-20 23:46:20,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5007590.0, ans=0.125 2024-08-20 23:46:32,072 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 11800, loss[loss=0.07847, beats_loss=0.01374, ecapa_loss=0.0001128, whisper_loss=0.06361, over 17080.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01024, ecapa_loss=0.000139, whisper_loss=0.09166, over 3834734.72 frames. ], batch size: 69, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:46:32,226 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 21 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 23:46:35,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5007690.0, ans=0.1 2024-08-20 23:46:37,640 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 24 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-20 23:46:48,756 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.54 vs. limit=10.0 2024-08-20 23:46:52,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=5007790.0, ans=0.2 2024-08-20 23:47:21,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=5007890.0, ans=0.2 2024-08-20 23:47:35,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5007990.0, ans=0.125 2024-08-20 23:47:38,938 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.11 vs. limit=15.0 2024-08-20 23:48:01,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=5008090.0, ans=0.0 2024-08-20 23:48:08,564 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-20 23:48:08,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=5008090.0, ans=0.0 2024-08-20 23:48:11,484 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 11850, loss[loss=0.1008, beats_loss=0.008657, ecapa_loss=0.0001225, whisper_loss=0.09097, over 14477.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01021, ecapa_loss=0.0001394, whisper_loss=0.09185, over 3811006.54 frames. ], batch size: 55, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:48:14,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=5008190.0, ans=0.0 2024-08-20 23:48:18,126 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 25 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-20 23:48:25,902 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 20 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 23:48:40,718 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 15 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-20 23:48:49,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=5008290.0, ans=0.125 2024-08-20 23:49:20,963 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.687e+01 2.312e+01 2.598e+01 2.861e+01 4.202e+01, threshold=5.196e+01, percent-clipped=0.0 2024-08-20 23:49:51,658 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 11900, loss[loss=0.1082, beats_loss=0.009078, ecapa_loss=0.0001139, whisper_loss=0.09801, over 17892.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01018, ecapa_loss=0.0001398, whisper_loss=0.09201, over 3817640.26 frames. ], batch size: 66, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:49:53,859 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 24 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 23:50:12,222 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.50 vs. limit=15.0 2024-08-20 23:50:18,446 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 26 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-20 23:50:28,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5008890.0, ans=0.1 2024-08-20 23:50:30,749 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.00 vs. limit=10.0 2024-08-20 23:50:38,405 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 21 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-20 23:50:50,104 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 21 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 23:51:17,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=5009090.0, ans=0.2 2024-08-20 23:51:23,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=5009090.0, ans=0.125 2024-08-20 23:51:29,140 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 11950, loss[loss=0.1005, beats_loss=0.01073, ecapa_loss=0.0001358, whisper_loss=0.08837, over 22155.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01039, ecapa_loss=0.0001396, whisper_loss=0.0907, over 3845780.91 frames. ], batch size: 92, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:51:59,525 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 23:52:03,380 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 17 from LS+wenet, 29 from Vox, 18 fro AS 2024-08-20 23:52:31,706 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.85 vs. limit=15.0 2024-08-20 23:52:40,551 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.309e+01 2.519e+01 2.758e+01 3.846e+01, threshold=5.038e+01, percent-clipped=0.0 2024-08-20 23:52:42,006 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.78 vs. limit=10.0 2024-08-20 23:52:43,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=5009490.0, ans=0.0 2024-08-20 23:53:07,263 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 12000, loss[loss=0.09605, beats_loss=0.01051, ecapa_loss=0.000138, whisper_loss=0.08415, over 17679.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01044, ecapa_loss=0.0001394, whisper_loss=0.09047, over 3856621.70 frames. ], batch size: 73, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:53:07,264 INFO [train_multi_KD3.py:1140] (3/4) Computing validation loss 2024-08-20 23:53:44,062 INFO [train_multi_KD3.py:1150] (3/4) Epoch 34, validation on ASR_libri: loss=0.2573, beats_loss=0, ecapa_loss=0.0005075, whisper_loss=0.2522, over 931116.00 frames. 2024-08-20 23:54:09,199 INFO [train_multi_KD3.py:1150] (3/4) Epoch 34, validation on SV_voxceleb1: loss=0.003964, beats_loss=0, ecapa_loss=0.0003964, whisper_loss=0, over 944235.00 frames. 2024-08-20 23:55:45,775 INFO [train_multi_KD3.py:1150] (3/4) Epoch 34, validation on AT_audioset: loss=0.02298, beats_loss=0.02298, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 23:55:45,779 INFO [train_multi_KD3.py:1156] (3/4) Maximum memory allocated so far is 32606MB 2024-08-20 23:55:46,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=5009690.0, ans=0.2 2024-08-20 23:55:54,594 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.32 vs. limit=12.0 2024-08-20 23:56:14,806 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 18 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-20 23:56:21,693 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.84 vs. limit=12.0 2024-08-20 23:56:30,621 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 35 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-20 23:56:36,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5009990.0, ans=0.125 2024-08-20 23:56:43,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=5009990.0, ans=0.125 2024-08-20 23:56:43,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5009990.0, ans=0.125 2024-08-20 23:57:11,222 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 12050, loss[loss=0.1124, beats_loss=0.009803, ecapa_loss=0.0001369, whisper_loss=0.1013, over 21776.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01052, ecapa_loss=0.000139, whisper_loss=0.09007, over 3864061.77 frames. ], batch size: 89, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-20 23:57:14,538 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.963e+00 2024-08-20 23:57:27,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=5010190.0, ans=0.02 2024-08-20 23:58:02,323 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 22 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 23:58:06,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=5010490.0, ans=0.2 2024-08-20 23:58:08,983 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.09 vs. limit=12.0 2024-08-20 23:58:11,435 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 26 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-20 23:58:14,124 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.275e+01 2.516e+01 2.859e+01 1.031e+02, threshold=5.032e+01, percent-clipped=2.0 2024-08-20 23:58:18,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.44 vs. limit=12.0 2024-08-20 23:58:20,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=5010490.0, ans=0.0 2024-08-20 23:58:33,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=5010590.0, ans=0.035 2024-08-20 23:58:34,776 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 31 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-20 23:58:39,943 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 12100, loss[loss=0.1001, beats_loss=0.01144, ecapa_loss=0.0001327, whisper_loss=0.08734, over 15782.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01055, ecapa_loss=0.00014, whisper_loss=0.08983, over 3883680.59 frames. ], batch size: 64, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-20 23:58:44,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5010690.0, ans=0.125 2024-08-20 23:58:48,642 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 23:59:14,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=5010890.0, ans=0.035 2024-08-20 23:59:15,844 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 19 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-20 23:59:21,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=5010890.0, ans=0.125 2024-08-20 23:59:23,942 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 40 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-20 23:59:34,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5010990.0, ans=0.1 2024-08-21 00:00:06,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=5011090.0, ans=0.0 2024-08-21 00:00:11,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=5011190.0, ans=0.2 2024-08-21 00:00:12,313 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 12150, loss[loss=0.1085, beats_loss=0.00931, ecapa_loss=0.0001666, whisper_loss=0.09748, over 21109.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01047, ecapa_loss=0.0001412, whisper_loss=0.09022, over 3902116.96 frames. ], batch size: 89, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:00:45,177 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.04 vs. limit=22.5 2024-08-21 00:01:03,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=5011390.0, ans=0.125 2024-08-21 00:01:12,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=5011490.0, ans=0.125 2024-08-21 00:01:18,407 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.230e+01 2.539e+01 2.959e+01 2.449e+02, threshold=5.079e+01, percent-clipped=2.0 2024-08-21 00:01:32,334 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 27 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-21 00:01:46,157 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 12200, loss[loss=0.1003, beats_loss=0.008856, ecapa_loss=0.000197, whisper_loss=0.08949, over 20874.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.0001411, whisper_loss=0.09051, over 3861460.64 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:01:51,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5011690.0, ans=0.125 2024-08-21 00:01:56,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5011690.0, ans=0.0 2024-08-21 00:01:59,198 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.78 vs. limit=15.0 2024-08-21 00:01:59,709 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 25 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-21 00:02:06,152 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.57 vs. limit=15.0 2024-08-21 00:02:18,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5011790.0, ans=0.125 2024-08-21 00:02:21,629 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 32 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-21 00:02:23,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5011790.0, ans=0.125 2024-08-21 00:02:42,018 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.90 vs. limit=15.0 2024-08-21 00:02:47,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5011890.0, ans=0.125 2024-08-21 00:03:22,533 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 15 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-21 00:03:34,190 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 12250, loss[loss=0.07003, beats_loss=0.01554, ecapa_loss=8.546e-05, whisper_loss=0.05363, over 14772.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01045, ecapa_loss=0.0001403, whisper_loss=0.08988, over 3828044.83 frames. ], batch size: 59, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:03:50,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5012190.0, ans=0.1 2024-08-21 00:03:50,986 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.70 vs. limit=15.0 2024-08-21 00:03:58,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=5012290.0, ans=0.125 2024-08-21 00:04:12,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5012390.0, ans=0.0 2024-08-21 00:04:13,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5012390.0, ans=0.125 2024-08-21 00:04:20,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5012390.0, ans=0.1 2024-08-21 00:04:33,248 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2024-08-21 00:04:38,661 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.682e+01 2.216e+01 2.475e+01 2.860e+01 1.621e+02, threshold=4.950e+01, percent-clipped=3.0 2024-08-21 00:04:43,109 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2024-08-21 00:05:05,781 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 12300, loss[loss=0.1256, beats_loss=0.008982, ecapa_loss=0.0001224, whisper_loss=0.1154, over 24107.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01045, ecapa_loss=0.0001395, whisper_loss=0.08998, over 3828546.65 frames. ], batch size: 89, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:05:07,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=5012690.0, ans=0.0 2024-08-21 00:05:14,028 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.59 vs. limit=15.0 2024-08-21 00:05:19,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=5012690.0, ans=0.125 2024-08-21 00:05:22,877 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 32 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-21 00:05:26,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5012790.0, ans=0.125 2024-08-21 00:05:34,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=5012790.0, ans=0.2 2024-08-21 00:05:51,592 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 24 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-21 00:06:27,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5013090.0, ans=0.125 2024-08-21 00:06:43,076 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 12350, loss[loss=0.09028, beats_loss=0.01151, ecapa_loss=0.0001609, whisper_loss=0.07716, over 17691.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01045, ecapa_loss=0.0001396, whisper_loss=0.09003, over 3830307.14 frames. ], batch size: 77, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:06:45,651 INFO [train_multi_KD3.py:845] (3/4) A total of 49 cuts. 15 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-21 00:06:55,459 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2024-08-21 00:07:08,070 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0743364468216896, model_norm_threshold=49.50318145751953 2024-08-21 00:07:08,240 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.289e+04, grad_sumsq=4.289e+04, orig_rms_sq=1.000e+00 2024-08-21 00:07:10,641 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-21 00:07:13,021 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 24 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-21 00:07:45,209 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.283e+01 2.548e+01 2.937e+01 6.659e+02, threshold=5.096e+01, percent-clipped=4.0 2024-08-21 00:07:58,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=5013590.0, ans=0.125 2024-08-21 00:08:03,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=5013590.0, ans=0.0 2024-08-21 00:08:09,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=5013590.0, ans=0.0 2024-08-21 00:08:12,658 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 12400, loss[loss=0.1208, beats_loss=0.01004, ecapa_loss=0.0001894, whisper_loss=0.1089, over 15359.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01049, ecapa_loss=0.00014, whisper_loss=0.08966, over 3799332.63 frames. ], batch size: 64, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:08:46,384 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 13 from LS+wenet, 9 from Vox, 32 fro AS 2024-08-21 00:09:03,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5013890.0, ans=0.1 2024-08-21 00:09:24,199 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=15.0 2024-08-21 00:09:24,256 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.46 vs. limit=22.5 2024-08-21 00:09:37,524 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 30 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-21 00:09:42,617 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 35 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-21 00:09:46,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=5014190.0, ans=0.0 2024-08-21 00:09:47,376 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 12450, loss[loss=0.09817, beats_loss=0.01063, ecapa_loss=0.0001518, whisper_loss=0.08602, over 22399.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01048, ecapa_loss=0.000139, whisper_loss=0.08968, over 3784587.99 frames. ], batch size: 93, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:10:04,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5014290.0, ans=0.125 2024-08-21 00:10:07,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5014290.0, ans=0.0 2024-08-21 00:10:31,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=5014390.0, ans=0.0 2024-08-21 00:10:45,137 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 31 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-21 00:10:46,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5014490.0, ans=0.125 2024-08-21 00:10:51,945 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.269e+01 2.484e+01 2.743e+01 3.672e+01, threshold=4.968e+01, percent-clipped=0.0 2024-08-21 00:11:04,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=5014590.0, ans=0.0 2024-08-21 00:11:19,805 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 12500, loss[loss=0.09971, beats_loss=0.009345, ecapa_loss=0.0001386, whisper_loss=0.08898, over 12981.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01049, ecapa_loss=0.0001387, whisper_loss=0.08918, over 3777109.57 frames. ], batch size: 51, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:11:22,125 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.960e-01 2024-08-21 00:11:28,681 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-21 00:12:16,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5014890.0, ans=0.1 2024-08-21 00:12:36,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=5015090.0, ans=0.0 2024-08-21 00:12:55,673 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 12550, loss[loss=0.111, beats_loss=0.008321, ecapa_loss=0.000132, whisper_loss=0.1014, over 20129.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01041, ecapa_loss=0.0001395, whisper_loss=0.08969, over 3785091.76 frames. ], batch size: 78, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:13:02,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5015190.0, ans=0.1 2024-08-21 00:13:19,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=5015290.0, ans=0.125 2024-08-21 00:13:21,012 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 20 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-21 00:13:22,947 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-21 00:13:39,257 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 16 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-21 00:13:40,745 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-21 00:13:50,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=5015490.0, ans=0.2 2024-08-21 00:13:51,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=5015490.0, ans=0.95 2024-08-21 00:14:00,191 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.650e+01 2.267e+01 2.470e+01 2.808e+01 4.015e+01, threshold=4.940e+01, percent-clipped=0.0 2024-08-21 00:14:13,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.33 vs. limit=22.5 2024-08-21 00:14:19,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5015590.0, ans=0.0 2024-08-21 00:14:27,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=5015690.0, ans=0.2 2024-08-21 00:14:28,239 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 12600, loss[loss=0.1231, beats_loss=0.008684, ecapa_loss=0.000153, whisper_loss=0.1128, over 22252.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0104, ecapa_loss=0.0001399, whisper_loss=0.08972, over 3799529.38 frames. ], batch size: 88, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:14:31,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5015690.0, ans=0.125 2024-08-21 00:14:38,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=5015690.0, ans=0.2 2024-08-21 00:14:47,479 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 28 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-21 00:14:53,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5015790.0, ans=0.125 2024-08-21 00:15:01,543 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 23 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-21 00:15:13,443 WARNING [optim.py:496] (3/4) Scaling gradients by 0.00705720903351903, model_norm_threshold=49.39711380004883 2024-08-21 00:15:13,608 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.280e+06, grad_sumsq=7.678e+08, orig_rms_sq=1.078e-02 2024-08-21 00:15:28,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5015990.0, ans=0.125 2024-08-21 00:15:35,323 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 22 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-21 00:15:51,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5016090.0, ans=0.125 2024-08-21 00:16:01,772 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 12650, loss[loss=0.1299, beats_loss=0.008407, ecapa_loss=0.0001706, whisper_loss=0.1198, over 22379.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01033, ecapa_loss=0.0001394, whisper_loss=0.09002, over 3790091.22 frames. ], batch size: 91, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:16:38,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=5016390.0, ans=0.07 2024-08-21 00:16:40,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=5016390.0, ans=0.0 2024-08-21 00:16:42,179 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 30 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-21 00:16:49,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=5016390.0, ans=0.125 2024-08-21 00:17:06,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5016490.0, ans=0.125 2024-08-21 00:17:07,125 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.324e+01 2.529e+01 2.803e+01 7.000e+03, threshold=5.059e+01, percent-clipped=4.0 2024-08-21 00:17:08,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=5016490.0, ans=0.0 2024-08-21 00:17:38,228 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 12700, loss[loss=0.09763, beats_loss=0.006434, ecapa_loss=0.0001478, whisper_loss=0.08972, over 13976.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01022, ecapa_loss=0.0001388, whisper_loss=0.0906, over 3810620.43 frames. ], batch size: 53, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:17:42,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5016690.0, ans=0.0 2024-08-21 00:17:48,276 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2024-08-21 00:18:16,587 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.50 vs. limit=15.0 2024-08-21 00:18:25,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5016890.0, ans=0.1 2024-08-21 00:18:27,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5016890.0, ans=0.0 2024-08-21 00:18:39,129 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 36 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-21 00:18:47,629 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.28 vs. limit=22.5 2024-08-21 00:19:11,154 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 12750, loss[loss=0.08743, beats_loss=0.009983, ecapa_loss=0.0001247, whisper_loss=0.0762, over 15337.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01038, ecapa_loss=0.0001382, whisper_loss=0.08924, over 3808145.20 frames. ], batch size: 61, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:19:11,396 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 24 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-21 00:19:22,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=5017190.0, ans=0.125 2024-08-21 00:20:08,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5017490.0, ans=0.125 2024-08-21 00:20:12,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=5017490.0, ans=0.5 2024-08-21 00:20:19,295 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.263e+01 2.506e+01 2.738e+01 4.032e+01, threshold=5.011e+01, percent-clipped=0.0 2024-08-21 00:20:29,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=5017590.0, ans=0.125 2024-08-21 00:20:31,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=5017590.0, ans=0.125 2024-08-21 00:20:36,613 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 15 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-21 00:20:44,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=5017590.0, ans=0.125 2024-08-21 00:20:46,989 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 12800, loss[loss=0.07967, beats_loss=0.01087, ecapa_loss=0.0001555, whisper_loss=0.06725, over 20576.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01043, ecapa_loss=0.0001383, whisper_loss=0.089, over 3797685.11 frames. ], batch size: 92, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:20:56,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5017690.0, ans=0.1 2024-08-21 00:20:59,152 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.37 vs. limit=15.0 2024-08-21 00:21:51,679 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.23 vs. limit=15.0 2024-08-21 00:21:53,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5017990.0, ans=0.125 2024-08-21 00:22:29,461 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 12850, loss[loss=0.1015, beats_loss=0.008337, ecapa_loss=0.0001554, whisper_loss=0.0916, over 21870.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0104, ecapa_loss=0.0001388, whisper_loss=0.08918, over 3812159.18 frames. ], batch size: 85, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:22:48,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5018290.0, ans=0.1 2024-08-21 00:23:21,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5018390.0, ans=0.125 2024-08-21 00:23:38,778 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.266e+01 2.519e+01 2.755e+01 3.962e+01, threshold=5.039e+01, percent-clipped=0.0 2024-08-21 00:23:41,471 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 14 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-21 00:23:45,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5018490.0, ans=0.125 2024-08-21 00:23:56,458 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 26 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-21 00:23:59,175 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 19 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-21 00:24:11,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5018690.0, ans=0.125 2024-08-21 00:24:12,762 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 12900, loss[loss=0.1398, beats_loss=0.006713, ecapa_loss=0.0001441, whisper_loss=0.1317, over 18595.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01042, ecapa_loss=0.0001398, whisper_loss=0.08905, over 3779926.28 frames. ], batch size: 68, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:24:29,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5018790.0, ans=0.0 2024-08-21 00:25:06,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=5018890.0, ans=0.0 2024-08-21 00:25:30,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=5019090.0, ans=0.04949747468305833 2024-08-21 00:25:37,008 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 37 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-21 00:25:41,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5019090.0, ans=0.0 2024-08-21 00:25:48,043 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 12950, loss[loss=0.1075, beats_loss=0.008741, ecapa_loss=0.0001567, whisper_loss=0.09722, over 21165.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01039, ecapa_loss=0.00014, whisper_loss=0.08965, over 3800342.21 frames. ], batch size: 84, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:25:51,784 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=11.32 vs. limit=12.0 2024-08-21 00:25:52,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5019190.0, ans=0.125 2024-08-21 00:25:52,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=5019190.0, ans=0.2 2024-08-21 00:26:05,029 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 24 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-21 00:26:09,218 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 13 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-21 00:26:10,897 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=12.0 2024-08-21 00:26:18,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=5019290.0, ans=0.0 2024-08-21 00:26:29,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=5019390.0, ans=0.0 2024-08-21 00:26:34,540 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.80 vs. limit=10.0 2024-08-21 00:26:45,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=5019390.0, ans=0.0 2024-08-21 00:26:59,472 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.575e+01 2.232e+01 2.418e+01 2.707e+01 3.600e+01, threshold=4.835e+01, percent-clipped=0.0 2024-08-21 00:27:17,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=5019590.0, ans=0.125 2024-08-21 00:27:23,429 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 15 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-21 00:27:33,078 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 13000, loss[loss=0.09277, beats_loss=0.01092, ecapa_loss=0.0001828, whisper_loss=0.08002, over 19998.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01044, ecapa_loss=0.0001385, whisper_loss=0.08956, over 3771884.19 frames. ], batch size: 88, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:28:10,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=5019890.0, ans=0.125 2024-08-21 00:28:24,002 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-21 00:28:37,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5019990.0, ans=0.0 2024-08-21 00:28:40,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=5019990.0, ans=0.05 2024-08-21 00:28:53,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=5020090.0, ans=0.05 2024-08-21 00:28:56,105 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-21 00:29:09,107 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-21 00:29:10,489 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 13050, loss[loss=0.1047, beats_loss=0.01017, ecapa_loss=0.0001197, whisper_loss=0.09331, over 22214.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01056, ecapa_loss=0.000139, whisper_loss=0.08945, over 3810672.16 frames. ], batch size: 89, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:29:13,517 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.02 vs. limit=10.0 2024-08-21 00:29:23,425 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-21 00:29:33,902 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-21 00:29:43,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=5020290.0, ans=0.125 2024-08-21 00:29:52,647 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 00:30:01,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5020390.0, ans=0.125 2024-08-21 00:30:15,770 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.250e+01 2.442e+01 2.810e+01 6.229e+01, threshold=4.884e+01, percent-clipped=1.0 2024-08-21 00:30:28,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=5020590.0, ans=0.2 2024-08-21 00:30:44,120 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 13100, loss[loss=0.1023, beats_loss=0.01035, ecapa_loss=0.0001525, whisper_loss=0.09043, over 21586.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01053, ecapa_loss=0.0001388, whisper_loss=0.08939, over 3819171.64 frames. ], batch size: 88, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:31:46,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5020990.0, ans=0.125 2024-08-21 00:32:13,909 WARNING [optim.py:496] (3/4) Scaling gradients by 0.09580767154693604, model_norm_threshold=48.835636138916016 2024-08-21 00:32:14,075 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.692e+04, grad_sumsq=4.692e+04, orig_rms_sq=1.000e+00 2024-08-21 00:32:19,230 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 13150, loss[loss=0.09888, beats_loss=0.009471, ecapa_loss=0.0001773, whisper_loss=0.08764, over 20448.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0105, ecapa_loss=0.0001404, whisper_loss=0.08912, over 3791158.23 frames. ], batch size: 86, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:32:43,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=5021290.0, ans=0.2 2024-08-21 00:33:13,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=5021490.0, ans=0.0 2024-08-21 00:33:19,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5021490.0, ans=0.125 2024-08-21 00:33:21,261 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 23 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-21 00:33:22,060 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.398e+01 2.549e+01 2.951e+01 5.097e+02, threshold=5.098e+01, percent-clipped=2.0 2024-08-21 00:33:33,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=5021590.0, ans=0.0 2024-08-21 00:33:37,202 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-21 00:33:52,581 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 13200, loss[loss=0.1172, beats_loss=0.009242, ecapa_loss=0.0001502, whisper_loss=0.1065, over 21004.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01051, ecapa_loss=0.0001399, whisper_loss=0.08892, over 3833749.01 frames. ], batch size: 82, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:33:53,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5021690.0, ans=0.125 2024-08-21 00:34:20,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5021790.0, ans=0.1 2024-08-21 00:34:24,786 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 24 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-21 00:34:29,674 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 14 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-21 00:34:45,166 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 29 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-21 00:34:51,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5021890.0, ans=0.125 2024-08-21 00:35:02,458 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=12.0 2024-08-21 00:35:22,655 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.04 vs. limit=22.5 2024-08-21 00:35:28,933 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 21 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-21 00:35:34,096 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 13250, loss[loss=0.1009, beats_loss=0.01079, ecapa_loss=0.0001758, whisper_loss=0.08839, over 21686.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01055, ecapa_loss=0.0001403, whisper_loss=0.08887, over 3810204.61 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:35:44,072 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.32 vs. limit=15.0 2024-08-21 00:35:45,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=5022190.0, ans=0.95 2024-08-21 00:35:50,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=5022290.0, ans=0.0 2024-08-21 00:35:55,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=5022290.0, ans=0.125 2024-08-21 00:36:15,060 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 23 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-21 00:36:22,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=5022390.0, ans=0.2 2024-08-21 00:36:23,354 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.14 vs. limit=15.0 2024-08-21 00:36:29,459 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 18 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-21 00:36:29,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5022490.0, ans=0.0 2024-08-21 00:36:37,130 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.371e+01 2.560e+01 2.906e+01 3.702e+02, threshold=5.119e+01, percent-clipped=3.0 2024-08-21 00:36:47,501 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 18 from LS+wenet, 11 from Vox, 37 fro AS 2024-08-21 00:37:08,388 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 13300, loss[loss=0.1311, beats_loss=0.006906, ecapa_loss=0.0001381, whisper_loss=0.1228, over 20325.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01051, ecapa_loss=0.0001397, whisper_loss=0.0889, over 3840826.54 frames. ], batch size: 77, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:37:25,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=5022790.0, ans=0.0 2024-08-21 00:38:00,714 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 25 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-21 00:38:11,985 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 38 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-21 00:38:17,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=5022990.0, ans=0.0 2024-08-21 00:38:21,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=5022990.0, ans=0.125 2024-08-21 00:38:28,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5023090.0, ans=0.125 2024-08-21 00:38:39,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=5023090.0, ans=0.2 2024-08-21 00:38:43,914 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 13350, loss[loss=0.1232, beats_loss=0.009229, ecapa_loss=0.0001437, whisper_loss=0.1125, over 17763.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01045, ecapa_loss=0.0001401, whisper_loss=0.08973, over 3820819.80 frames. ], batch size: 70, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:38:45,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=5023190.0, ans=0.025 2024-08-21 00:38:46,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=5023190.0, ans=0.0 2024-08-21 00:39:07,756 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 20 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-21 00:39:09,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5023290.0, ans=0.1 2024-08-21 00:39:21,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=5023390.0, ans=0.0 2024-08-21 00:39:27,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5023390.0, ans=0.0 2024-08-21 00:39:34,971 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 32 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-21 00:39:38,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5023490.0, ans=0.125 2024-08-21 00:39:38,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=5023490.0, ans=0.0 2024-08-21 00:39:49,237 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.315e+01 2.502e+01 2.886e+01 3.923e+01, threshold=5.005e+01, percent-clipped=0.0 2024-08-21 00:40:18,744 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 13400, loss[loss=0.09847, beats_loss=0.01269, ecapa_loss=0.0001099, whisper_loss=0.08468, over 13267.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01043, ecapa_loss=0.0001401, whisper_loss=0.08963, over 3784965.33 frames. ], batch size: 52, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:40:26,909 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-21 00:40:37,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5023790.0, ans=0.125 2024-08-21 00:40:56,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=5023890.0, ans=0.0 2024-08-21 00:41:31,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5024090.0, ans=0.0 2024-08-21 00:41:43,156 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 24 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-21 00:41:46,500 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 19 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-21 00:41:47,590 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 13450, loss[loss=0.1056, beats_loss=0.006953, ecapa_loss=0.0001233, whisper_loss=0.09737, over 15076.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01039, ecapa_loss=0.0001409, whisper_loss=0.08927, over 3767798.06 frames. ], batch size: 54, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:41:57,386 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 37 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-21 00:42:00,704 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 39 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-21 00:42:07,580 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-21 00:42:13,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5024290.0, ans=0.0 2024-08-21 00:42:18,437 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=8.229e-02 2024-08-21 00:42:36,145 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.73 vs. limit=22.5 2024-08-21 00:42:42,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5024490.0, ans=0.1 2024-08-21 00:42:44,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=5024490.0, ans=0.125 2024-08-21 00:42:51,898 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.706e+01 2.233e+01 2.383e+01 2.688e+01 3.683e+01, threshold=4.765e+01, percent-clipped=0.0 2024-08-21 00:43:22,792 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 13500, loss[loss=0.08325, beats_loss=0.0111, ecapa_loss=9.788e-05, whisper_loss=0.07117, over 15106.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0104, ecapa_loss=0.00014, whisper_loss=0.08972, over 3776965.74 frames. ], batch size: 55, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:43:54,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=5024790.0, ans=0.125 2024-08-21 00:43:57,600 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 22 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-21 00:44:10,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=5024890.0, ans=0.07 2024-08-21 00:44:18,943 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 31 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-21 00:44:34,054 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-21 00:44:41,300 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.00 vs. limit=15.0 2024-08-21 00:44:59,204 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 13550, loss[loss=0.09406, beats_loss=0.009224, ecapa_loss=0.000127, whisper_loss=0.08356, over 14145.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001392, whisper_loss=0.09031, over 3788893.29 frames. ], batch size: 54, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:45:35,664 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 23 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-21 00:45:44,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5025390.0, ans=0.0 2024-08-21 00:45:45,211 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 10 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-21 00:45:48,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5025390.0, ans=0.125 2024-08-21 00:46:07,716 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.237e+01 2.540e+01 2.882e+01 4.860e+01, threshold=5.081e+01, percent-clipped=1.0 2024-08-21 00:46:18,235 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2024-08-21 00:46:22,553 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 34 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-21 00:46:32,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5025590.0, ans=0.1 2024-08-21 00:46:34,838 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 13600, loss[loss=0.1125, beats_loss=0.007403, ecapa_loss=0.0001811, whisper_loss=0.1033, over 13622.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01041, ecapa_loss=0.0001393, whisper_loss=0.08998, over 3773008.14 frames. ], batch size: 57, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:46:35,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=5025690.0, ans=0.125 2024-08-21 00:46:38,846 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 7 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-21 00:46:39,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5025690.0, ans=0.125 2024-08-21 00:46:43,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5025690.0, ans=0.125 2024-08-21 00:46:50,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=5025690.0, ans=0.5 2024-08-21 00:46:53,496 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 29 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-21 00:47:14,846 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.19 vs. limit=15.0 2024-08-21 00:47:15,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=5025890.0, ans=0.125 2024-08-21 00:47:22,455 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 18 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-21 00:47:34,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=5025990.0, ans=0.0 2024-08-21 00:47:35,182 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-21 00:47:48,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=5025990.0, ans=0.125 2024-08-21 00:47:55,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=5026090.0, ans=0.125 2024-08-21 00:47:56,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5026090.0, ans=0.125 2024-08-21 00:48:04,410 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.51 vs. limit=10.0 2024-08-21 00:48:09,162 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.16 vs. limit=10.0 2024-08-21 00:48:09,544 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 13650, loss[loss=0.1012, beats_loss=0.01144, ecapa_loss=0.0001491, whisper_loss=0.08826, over 22506.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.0001389, whisper_loss=0.09044, over 3814054.67 frames. ], batch size: 94, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:48:19,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=5026190.0, ans=0.07 2024-08-21 00:48:38,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=5026290.0, ans=0.04949747468305833 2024-08-21 00:48:52,073 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-21 00:48:58,049 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 24 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-21 00:49:00,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5026390.0, ans=0.0 2024-08-21 00:49:22,984 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.304e+01 2.539e+01 2.805e+01 5.664e+01, threshold=5.078e+01, percent-clipped=1.0 2024-08-21 00:49:31,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=5026490.0, ans=0.0 2024-08-21 00:49:52,289 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 13700, loss[loss=0.1073, beats_loss=0.006114, ecapa_loss=0.0001372, whisper_loss=0.09983, over 15220.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01035, ecapa_loss=0.0001401, whisper_loss=0.09085, over 3827359.44 frames. ], batch size: 53, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:49:54,397 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 28 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-21 00:50:25,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5026790.0, ans=0.125 2024-08-21 00:50:36,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5026890.0, ans=0.125 2024-08-21 00:50:38,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=5026890.0, ans=0.125 2024-08-21 00:50:54,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5026990.0, ans=0.0 2024-08-21 00:51:27,008 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-21 00:51:27,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=5027090.0, ans=0.125 2024-08-21 00:51:30,592 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 21 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-21 00:51:32,324 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 13750, loss[loss=0.08863, beats_loss=0.01056, ecapa_loss=0.000125, whisper_loss=0.07682, over 20197.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01039, ecapa_loss=0.00014, whisper_loss=0.0902, over 3835015.19 frames. ], batch size: 78, lr: 1.78e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:51:42,965 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 22 from LS+wenet, 20 from Vox, 50 fro AS 2024-08-21 00:51:46,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.18 vs. limit=15.0 2024-08-21 00:52:01,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=5027290.0, ans=0.0 2024-08-21 00:52:14,887 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-21 00:52:19,354 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-21 00:52:26,105 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 21 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-21 00:52:36,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=5027490.0, ans=0.2 2024-08-21 00:52:45,992 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.353e+01 2.699e+01 3.002e+01 5.030e+02, threshold=5.398e+01, percent-clipped=2.0 2024-08-21 00:53:07,484 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 27 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-21 00:53:16,615 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 13800, loss[loss=0.09488, beats_loss=0.009998, ecapa_loss=0.0001482, whisper_loss=0.0834, over 13968.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01035, ecapa_loss=0.0001396, whisper_loss=0.09024, over 3786473.26 frames. ], batch size: 54, lr: 1.78e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:53:28,559 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 26 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-21 00:53:33,864 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 21 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-21 00:53:34,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5027790.0, ans=0.1 2024-08-21 00:53:37,734 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 15 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-21 00:53:40,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=5027790.0, ans=0.2 2024-08-21 00:53:56,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=5027890.0, ans=0.125 2024-08-21 00:53:57,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=5027890.0, ans=0.125 2024-08-21 00:54:05,200 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 20 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-21 00:54:05,851 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.49 vs. limit=22.5 2024-08-21 00:54:17,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=5027990.0, ans=0.0 2024-08-21 00:54:18,633 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 37 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-21 00:54:29,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=5028090.0, ans=0.125 2024-08-21 00:54:44,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5028090.0, ans=0.1 2024-08-21 00:54:49,148 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 13850, loss[loss=0.1218, beats_loss=0.009815, ecapa_loss=0.0001196, whisper_loss=0.1108, over 18991.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01041, ecapa_loss=0.0001387, whisper_loss=0.08993, over 3805542.94 frames. ], batch size: 73, lr: 1.78e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:55:04,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=5028190.0, ans=0.125 2024-08-21 00:55:26,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=5028290.0, ans=0.05 2024-08-21 00:55:35,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=5028390.0, ans=10.0 2024-08-21 00:55:40,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=5028390.0, ans=0.025 2024-08-21 00:55:58,147 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.193e+01 2.425e+01 2.661e+01 8.724e+01, threshold=4.850e+01, percent-clipped=1.0 2024-08-21 00:56:13,896 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.10 vs. limit=10.0 2024-08-21 00:56:25,536 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 13900, loss[loss=0.07116, beats_loss=0.0136, ecapa_loss=0.0001518, whisper_loss=0.05604, over 19910.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01051, ecapa_loss=0.0001379, whisper_loss=0.08974, over 3789333.06 frames. ], batch size: 83, lr: 1.78e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:56:32,687 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 19 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-21 00:56:36,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5028690.0, ans=0.1 2024-08-21 00:56:42,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5028790.0, ans=0.1 2024-08-21 00:56:57,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=5028790.0, ans=0.125 2024-08-21 00:57:01,014 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 36 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-21 00:57:25,598 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 24 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-21 00:57:34,838 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 29 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-21 00:57:49,136 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-21 00:57:57,569 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 13950, loss[loss=0.1046, beats_loss=0.00974, ecapa_loss=0.0001682, whisper_loss=0.09315, over 18370.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01054, ecapa_loss=0.0001386, whisper_loss=0.08957, over 3793039.98 frames. ], batch size: 76, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 00:58:21,497 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 13 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-21 00:58:23,850 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.26 vs. limit=15.0 2024-08-21 00:58:32,248 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 22 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-21 00:58:32,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=5029290.0, ans=0.07 2024-08-21 00:58:36,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=5029290.0, ans=0.0 2024-08-21 00:58:43,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5029390.0, ans=0.0 2024-08-21 00:58:47,175 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 15 from LS+wenet, 9 from Vox, 39 fro AS 2024-08-21 00:59:03,718 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 16 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-21 00:59:08,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=5029490.0, ans=10.0 2024-08-21 00:59:12,615 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.187e+01 2.492e+01 2.707e+01 4.586e+01, threshold=4.984e+01, percent-clipped=0.0 2024-08-21 00:59:23,685 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.62 vs. limit=10.0 2024-08-21 00:59:43,123 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 14000, loss[loss=0.09697, beats_loss=0.01029, ecapa_loss=0.0001556, whisper_loss=0.08513, over 19581.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.0001381, whisper_loss=0.09004, over 3796289.91 frames. ], batch size: 80, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 00:59:50,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5029690.0, ans=0.125 2024-08-21 01:00:07,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5029790.0, ans=0.125 2024-08-21 01:00:19,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=5029790.0, ans=0.125 2024-08-21 01:00:20,090 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.06 vs. limit=15.0 2024-08-21 01:00:45,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=5029890.0, ans=0.5 2024-08-21 01:00:45,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=5029890.0, ans=0.125 2024-08-21 01:00:54,332 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 10 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-21 01:01:04,219 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.14 vs. limit=15.0 2024-08-21 01:01:10,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5030090.0, ans=0.125 2024-08-21 01:01:19,445 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-21 01:01:20,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=5030090.0, ans=0.0 2024-08-21 01:01:24,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5030090.0, ans=0.1 2024-08-21 01:01:27,264 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 22 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-21 01:01:27,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5030090.0, ans=0.125 2024-08-21 01:01:30,784 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 14050, loss[loss=0.0956, beats_loss=0.01249, ecapa_loss=0.0001268, whisper_loss=0.08185, over 21954.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01052, ecapa_loss=0.0001386, whisper_loss=0.0891, over 3788256.79 frames. ], batch size: 92, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:01:53,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=5030290.0, ans=0.125 2024-08-21 01:02:11,195 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 27 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-21 01:02:24,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5030390.0, ans=0.125 2024-08-21 01:02:42,135 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.263e+01 2.516e+01 2.757e+01 1.194e+02, threshold=5.032e+01, percent-clipped=1.0 2024-08-21 01:02:42,393 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 26 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-21 01:02:45,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5030490.0, ans=0.125 2024-08-21 01:03:13,526 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 14100, loss[loss=0.08147, beats_loss=0.01615, ecapa_loss=9.512e-05, whisper_loss=0.06437, over 19950.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01051, ecapa_loss=0.0001387, whisper_loss=0.08924, over 3762953.60 frames. ], batch size: 79, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:03:20,510 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 28 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-21 01:03:23,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=5030690.0, ans=0.07 2024-08-21 01:03:39,366 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.43 vs. limit=10.0 2024-08-21 01:03:44,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5030790.0, ans=0.0 2024-08-21 01:03:46,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=5030790.0, ans=0.5 2024-08-21 01:04:10,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5030890.0, ans=0.125 2024-08-21 01:04:12,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=5030990.0, ans=0.0 2024-08-21 01:04:21,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=5030990.0, ans=0.0 2024-08-21 01:04:31,472 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.40 vs. limit=15.0 2024-08-21 01:04:46,137 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.26 vs. limit=15.0 2024-08-21 01:04:50,695 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 14150, loss[loss=0.1076, beats_loss=0.008667, ecapa_loss=0.0001359, whisper_loss=0.09754, over 17924.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01043, ecapa_loss=0.0001385, whisper_loss=0.08919, over 3764268.85 frames. ], batch size: 68, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:04:58,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=5031190.0, ans=0.2 2024-08-21 01:05:14,063 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 24 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-21 01:05:17,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5031290.0, ans=0.1 2024-08-21 01:05:19,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=5031290.0, ans=0.125 2024-08-21 01:05:29,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=5031290.0, ans=0.2 2024-08-21 01:05:40,230 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.181e+05 2024-08-21 01:05:40,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=5031390.0, ans=0.125 2024-08-21 01:05:42,720 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.04 vs. limit=22.5 2024-08-21 01:05:59,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5031490.0, ans=0.125 2024-08-21 01:06:02,317 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.620e+01 2.285e+01 2.566e+01 2.926e+01 5.021e+02, threshold=5.132e+01, percent-clipped=5.0 2024-08-21 01:06:14,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=5031590.0, ans=0.0 2024-08-21 01:06:29,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5031590.0, ans=0.125 2024-08-21 01:06:30,133 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 28 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-21 01:06:35,453 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 14200, loss[loss=0.1128, beats_loss=0.008781, ecapa_loss=0.00017, whisper_loss=0.1024, over 19820.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01048, ecapa_loss=0.0001386, whisper_loss=0.08935, over 3768227.01 frames. ], batch size: 81, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:06:40,078 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 28 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-21 01:06:43,025 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 22 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-21 01:06:50,736 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 22 from LS+wenet, 15 from Vox, 16 fro AS 2024-08-21 01:07:08,439 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 28 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-21 01:07:10,519 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 17 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-21 01:07:12,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=5031890.0, ans=0.125 2024-08-21 01:07:17,275 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 01:07:29,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5031890.0, ans=0.125 2024-08-21 01:07:55,265 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2024-08-21 01:08:03,569 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 29 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-21 01:08:09,676 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 14250, loss[loss=0.1154, beats_loss=0.009557, ecapa_loss=0.0001449, whisper_loss=0.1044, over 22962.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01038, ecapa_loss=0.0001387, whisper_loss=0.09071, over 3801180.51 frames. ], batch size: 93, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:08:12,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=5032190.0, ans=0.125 2024-08-21 01:08:14,056 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 01:08:14,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5032190.0, ans=0.125 2024-08-21 01:08:36,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=5032290.0, ans=0.125 2024-08-21 01:08:55,928 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=4.720e-02 2024-08-21 01:09:01,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=5032390.0, ans=0.05 2024-08-21 01:09:04,034 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 27 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-21 01:09:16,629 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.235e+01 2.454e+01 2.805e+01 6.761e+01, threshold=4.908e+01, percent-clipped=2.0 2024-08-21 01:09:42,154 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 14300, loss[loss=0.07374, beats_loss=0.01129, ecapa_loss=0.0001392, whisper_loss=0.06106, over 13927.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01044, ecapa_loss=0.0001386, whisper_loss=0.0903, over 3792975.66 frames. ], batch size: 53, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:09:52,394 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.84 vs. limit=10.0 2024-08-21 01:10:03,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5032790.0, ans=0.125 2024-08-21 01:10:04,402 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 33 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-21 01:10:04,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=5032790.0, ans=0.125 2024-08-21 01:10:17,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5032890.0, ans=0.1 2024-08-21 01:10:29,984 INFO [train_multi_KD3.py:845] (3/4) A total of 49 cuts. 20 from LS+wenet, 10 from Vox, 19 fro AS 2024-08-21 01:10:48,609 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 27 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-21 01:10:50,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=5032990.0, ans=0.2 2024-08-21 01:10:51,389 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-21 01:11:16,739 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 27 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-21 01:11:18,147 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 14350, loss[loss=0.1029, beats_loss=0.008974, ecapa_loss=0.0001719, whisper_loss=0.09216, over 21992.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01035, ecapa_loss=0.0001386, whisper_loss=0.09093, over 3776200.25 frames. ], batch size: 92, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:11:20,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=5033190.0, ans=0.125 2024-08-21 01:11:40,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=5033290.0, ans=0.0 2024-08-21 01:11:51,120 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.87 vs. limit=10.0 2024-08-21 01:12:05,274 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.47 vs. limit=15.0 2024-08-21 01:12:12,555 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-21 01:12:19,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=5033490.0, ans=0.2 2024-08-21 01:12:24,439 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.295e+01 2.538e+01 2.825e+01 4.751e+01, threshold=5.075e+01, percent-clipped=0.0 2024-08-21 01:12:35,475 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 25 from LS+wenet, 11 from Vox, 19 fro AS 2024-08-21 01:12:36,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5033590.0, ans=0.125 2024-08-21 01:12:38,600 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.86 vs. limit=5.0 2024-08-21 01:12:48,417 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 19 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-21 01:12:49,776 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 14400, loss[loss=0.1109, beats_loss=0.009455, ecapa_loss=0.0001247, whisper_loss=0.1002, over 15682.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01038, ecapa_loss=0.0001377, whisper_loss=0.09052, over 3746804.16 frames. ], batch size: 58, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:12:50,012 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 26 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-21 01:13:01,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5033690.0, ans=0.125 2024-08-21 01:13:03,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=5033690.0, ans=0.0 2024-08-21 01:13:07,907 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 27 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-21 01:13:26,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5033790.0, ans=0.0 2024-08-21 01:14:20,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5034090.0, ans=0.1 2024-08-21 01:14:20,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=5034090.0, ans=0.2 2024-08-21 01:14:24,126 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 35 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-21 01:14:32,008 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 14450, loss[loss=0.1284, beats_loss=0.01005, ecapa_loss=0.0001271, whisper_loss=0.1171, over 16163.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01033, ecapa_loss=0.0001383, whisper_loss=0.09092, over 3784817.65 frames. ], batch size: 62, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:14:52,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5034290.0, ans=0.0 2024-08-21 01:14:54,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5034290.0, ans=0.125 2024-08-21 01:14:58,880 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 12 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-21 01:15:40,188 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.267e+01 2.442e+01 2.819e+01 1.713e+02, threshold=4.884e+01, percent-clipped=1.0 2024-08-21 01:15:43,420 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 20 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-21 01:15:46,983 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.38 vs. limit=22.5 2024-08-21 01:15:55,217 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 34 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-21 01:16:05,793 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 14500, loss[loss=0.1126, beats_loss=0.00987, ecapa_loss=0.0001286, whisper_loss=0.1014, over 22542.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01037, ecapa_loss=0.0001381, whisper_loss=0.0909, over 3786608.85 frames. ], batch size: 90, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:17:23,985 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 28 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-21 01:17:29,351 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.29 vs. limit=15.0 2024-08-21 01:17:44,085 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 14550, loss[loss=0.1084, beats_loss=0.009546, ecapa_loss=0.0001648, whisper_loss=0.09724, over 14225.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01031, ecapa_loss=0.0001388, whisper_loss=0.09078, over 3796430.60 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:17:57,777 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 29 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-21 01:18:18,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=5035290.0, ans=0.07 2024-08-21 01:18:33,211 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-21 01:18:41,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=5035490.0, ans=0.2 2024-08-21 01:18:54,131 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.744e+01 2.274e+01 2.528e+01 2.801e+01 4.515e+02, threshold=5.056e+01, percent-clipped=2.0 2024-08-21 01:19:00,672 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 27 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-21 01:19:11,093 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 27 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-21 01:19:21,683 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 14600, loss[loss=0.114, beats_loss=0.01005, ecapa_loss=0.0001569, whisper_loss=0.1023, over 20790.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01029, ecapa_loss=0.0001396, whisper_loss=0.0901, over 3800653.72 frames. ], batch size: 85, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:19:22,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5035690.0, ans=0.1 2024-08-21 01:19:38,403 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-21 01:20:24,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=5035990.0, ans=0.07 2024-08-21 01:20:32,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5035990.0, ans=0.1 2024-08-21 01:20:43,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5036090.0, ans=0.1 2024-08-21 01:20:57,012 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 14650, loss[loss=0.1147, beats_loss=0.01045, ecapa_loss=0.0001382, whisper_loss=0.1028, over 22653.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01032, ecapa_loss=0.0001385, whisper_loss=0.0899, over 3803988.91 frames. ], batch size: 92, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:21:15,261 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.42 vs. limit=12.0 2024-08-21 01:21:43,812 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 26 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-21 01:21:48,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=5036390.0, ans=0.0 2024-08-21 01:21:55,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=5036490.0, ans=10.0 2024-08-21 01:22:03,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5036490.0, ans=0.125 2024-08-21 01:22:06,128 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.288e+01 2.569e+01 2.805e+01 8.601e+01, threshold=5.137e+01, percent-clipped=2.0 2024-08-21 01:22:14,590 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 20 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-21 01:22:17,428 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.77 vs. limit=15.0 2024-08-21 01:22:30,209 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=15.94 vs. limit=15.0 2024-08-21 01:22:32,568 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 14700, loss[loss=0.1171, beats_loss=0.007909, ecapa_loss=0.0001217, whisper_loss=0.108, over 22838.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01034, ecapa_loss=0.0001377, whisper_loss=0.08971, over 3808336.43 frames. ], batch size: 85, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:22:38,988 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 28 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-21 01:22:50,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=5036790.0, ans=0.025 2024-08-21 01:22:58,603 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.57 vs. limit=15.0 2024-08-21 01:23:08,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=5036790.0, ans=10.0 2024-08-21 01:23:13,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5036890.0, ans=0.0 2024-08-21 01:23:22,138 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-21 01:23:35,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5036990.0, ans=0.125 2024-08-21 01:23:41,366 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 25 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-21 01:23:48,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5037090.0, ans=0.1 2024-08-21 01:23:56,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=5037090.0, ans=0.125 2024-08-21 01:24:08,877 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 14750, loss[loss=0.1122, beats_loss=0.0116, ecapa_loss=0.0001665, whisper_loss=0.09897, over 20511.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01037, ecapa_loss=0.0001389, whisper_loss=0.08991, over 3818442.19 frames. ], batch size: 88, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:24:17,943 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-21 01:24:26,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=5037190.0, ans=0.05 2024-08-21 01:24:29,130 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 32 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-21 01:24:42,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=5037290.0, ans=0.0 2024-08-21 01:24:58,585 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 29 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-21 01:25:02,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5037390.0, ans=0.125 2024-08-21 01:25:09,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5037490.0, ans=0.1 2024-08-21 01:25:10,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5037490.0, ans=0.125 2024-08-21 01:25:10,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=5037490.0, ans=0.125 2024-08-21 01:25:20,571 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.178e+01 2.451e+01 2.819e+01 4.132e+01, threshold=4.902e+01, percent-clipped=0.0 2024-08-21 01:25:30,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5037590.0, ans=0.125 2024-08-21 01:25:47,093 INFO [train_multi_KD3.py:1117] (3/4) Epoch 34, batch 14800, loss[loss=0.1023, beats_loss=0.01105, ecapa_loss=0.0001211, whisper_loss=0.09, over 23068.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01045, ecapa_loss=0.0001391, whisper_loss=0.09025, over 3832028.22 frames. ], batch size: 90, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:25:48,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=5037690.0, ans=0.5 2024-08-21 01:26:25,279 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 0, loss[loss=0.1321, beats_loss=0.006471, ecapa_loss=0.000144, whisper_loss=0.1242, over 15590.00 frames. ], tot_loss[loss=0.1321, beats_loss=0.006471, ecapa_loss=0.000144, whisper_loss=0.1242, over 15590.00 frames. ], batch size: 59, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:26:25,279 INFO [train_multi_KD3.py:1140] (3/4) Computing validation loss 2024-08-21 01:27:00,166 INFO [train_multi_KD3.py:1150] (3/4) Epoch 35, validation on ASR_libri: loss=0.2538, beats_loss=0, ecapa_loss=0.0005038, whisper_loss=0.2488, over 931116.00 frames. 2024-08-21 01:27:22,741 INFO [train_multi_KD3.py:1150] (3/4) Epoch 35, validation on SV_voxceleb1: loss=0.003936, beats_loss=0, ecapa_loss=0.0003936, whisper_loss=0, over 944235.00 frames. 2024-08-21 01:28:59,294 INFO [train_multi_KD3.py:1150] (3/4) Epoch 35, validation on AT_audioset: loss=0.02305, beats_loss=0.02305, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-21 01:28:59,297 INFO [train_multi_KD3.py:1156] (3/4) Maximum memory allocated so far is 32606MB 2024-08-21 01:29:21,863 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-21 01:29:27,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5037850.0, ans=0.125 2024-08-21 01:29:35,856 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 31 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-21 01:29:43,054 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 27 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-21 01:29:47,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=5037850.0, ans=0.95 2024-08-21 01:29:54,186 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 21 from LS+wenet, 27 from Vox, 19 fro AS 2024-08-21 01:30:12,220 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 16 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-21 01:30:28,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5038050.0, ans=0.125 2024-08-21 01:30:40,239 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 33 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-21 01:31:05,020 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 50, loss[loss=0.1128, beats_loss=0.006964, ecapa_loss=0.0001745, whisper_loss=0.1041, over 15649.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.008914, ecapa_loss=0.0001438, whisper_loss=0.09157, over 873786.07 frames. ], batch size: 62, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:31:08,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=5038250.0, ans=0.0 2024-08-21 01:31:47,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5038350.0, ans=0.1 2024-08-21 01:32:18,760 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.109e+01 2.525e+01 2.864e+01 3.213e+01 4.437e+01, threshold=5.728e+01, percent-clipped=0.0 2024-08-21 01:32:59,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5038650.0, ans=0.1 2024-08-21 01:33:13,879 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 100, loss[loss=0.1199, beats_loss=0.008822, ecapa_loss=0.0001114, whisper_loss=0.11, over 23382.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.009048, ecapa_loss=0.0001399, whisper_loss=0.09028, over 1472941.85 frames. ], batch size: 85, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:33:17,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5038750.0, ans=0.125 2024-08-21 01:33:56,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=5038850.0, ans=0.2 2024-08-21 01:34:18,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5038950.0, ans=0.1 2024-08-21 01:34:26,766 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 29 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-21 01:34:28,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.51 vs. limit=15.0 2024-08-21 01:34:31,013 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.65 vs. limit=6.0 2024-08-21 01:34:51,097 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 26 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-21 01:35:19,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5039250.0, ans=0.0 2024-08-21 01:35:19,890 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 150, loss[loss=0.1063, beats_loss=0.01101, ecapa_loss=0.0001344, whisper_loss=0.09395, over 21898.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.009167, ecapa_loss=0.0001403, whisper_loss=0.09004, over 1984289.47 frames. ], batch size: 88, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:35:24,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5039250.0, ans=0.125 2024-08-21 01:35:27,784 INFO [train_multi_KD3.py:845] (3/4) A total of 49 cuts. 13 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-21 01:35:49,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=5039350.0, ans=0.2 2024-08-21 01:35:49,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=5039350.0, ans=0.0 2024-08-21 01:35:52,289 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 17 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-21 01:35:58,938 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-21 01:36:09,054 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-21 01:36:23,194 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 14 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-21 01:36:25,332 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.962e+01 2.462e+01 2.718e+01 2.997e+01 1.008e+02, threshold=5.437e+01, percent-clipped=1.0 2024-08-21 01:36:34,826 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-21 01:36:54,519 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 23 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-21 01:37:08,663 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 200, loss[loss=0.09422, beats_loss=0.008847, ecapa_loss=0.0001766, whisper_loss=0.08361, over 20898.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.009354, ecapa_loss=0.00014, whisper_loss=0.09057, over 2362822.57 frames. ], batch size: 87, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:37:18,353 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 21 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-21 01:37:28,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5039850.0, ans=0.0 2024-08-21 01:37:31,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5039850.0, ans=0.125 2024-08-21 01:37:35,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=5039850.0, ans=0.2 2024-08-21 01:37:59,969 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.92 vs. limit=15.0 2024-08-21 01:38:02,477 INFO [train_multi_KD3.py:845] (3/4) A total of 96 cuts. 20 from LS+wenet, 28 from Vox, 48 fro AS 2024-08-21 01:38:09,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=5040050.0, ans=0.04949747468305833 2024-08-21 01:38:19,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=5040050.0, ans=0.0 2024-08-21 01:38:37,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5040150.0, ans=0.0 2024-08-21 01:38:41,743 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 250, loss[loss=0.1111, beats_loss=0.01044, ecapa_loss=0.0001593, whisper_loss=0.09905, over 18871.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.00968, ecapa_loss=0.000138, whisper_loss=0.09052, over 2635906.14 frames. ], batch size: 77, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:38:46,140 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-21 01:38:58,193 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.23 vs. limit=10.0 2024-08-21 01:39:09,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5040350.0, ans=0.125 2024-08-21 01:39:21,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5040450.0, ans=0.125 2024-08-21 01:39:39,864 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.776e+01 2.294e+01 2.516e+01 2.828e+01 4.079e+01, threshold=5.032e+01, percent-clipped=0.0 2024-08-21 01:39:50,693 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2024-08-21 01:39:55,351 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 28 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-21 01:40:01,223 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 13 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-21 01:40:17,473 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 300, loss[loss=0.1002, beats_loss=0.01273, ecapa_loss=0.000125, whisper_loss=0.0862, over 21847.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.009892, ecapa_loss=0.0001386, whisper_loss=0.08951, over 2866726.54 frames. ], batch size: 89, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:40:24,402 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 16 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-21 01:40:48,275 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 35 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-21 01:40:56,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5040950.0, ans=0.1 2024-08-21 01:41:04,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5040950.0, ans=0.0 2024-08-21 01:41:12,573 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 20 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-21 01:41:32,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5041150.0, ans=0.1 2024-08-21 01:41:41,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=5041150.0, ans=0.07 2024-08-21 01:41:51,470 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 350, loss[loss=0.1039, beats_loss=0.009425, ecapa_loss=0.0001461, whisper_loss=0.09297, over 15088.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01004, ecapa_loss=0.0001376, whisper_loss=0.08927, over 3098921.09 frames. ], batch size: 58, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:41:56,448 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2024-08-21 01:42:14,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5041350.0, ans=0.125 2024-08-21 01:42:16,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=5041350.0, ans=0.0 2024-08-21 01:42:22,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5041350.0, ans=0.125 2024-08-21 01:42:43,554 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.323e+01 2.496e+01 2.850e+01 5.461e+01, threshold=4.991e+01, percent-clipped=1.0 2024-08-21 01:42:51,593 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 19 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-21 01:42:53,768 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.61 vs. limit=10.0 2024-08-21 01:42:54,082 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.20 vs. limit=15.0 2024-08-21 01:43:11,096 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 21 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-21 01:43:14,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=5041650.0, ans=0.5 2024-08-21 01:43:19,307 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 400, loss[loss=0.1106, beats_loss=0.009399, ecapa_loss=0.0001296, whisper_loss=0.09991, over 22135.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01013, ecapa_loss=0.0001374, whisper_loss=0.08891, over 3253660.50 frames. ], batch size: 86, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:43:48,746 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 29 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-21 01:44:05,159 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-21 01:44:18,296 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 24 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-21 01:44:18,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=5042050.0, ans=0.0 2024-08-21 01:44:20,716 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.30 vs. limit=22.5 2024-08-21 01:44:28,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5042050.0, ans=0.125 2024-08-21 01:44:36,051 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-21 01:44:41,212 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 24 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-21 01:44:50,437 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 450, loss[loss=0.08947, beats_loss=0.01387, ecapa_loss=5.971e-05, whisper_loss=0.07501, over 15674.00 frames. ], tot_loss[loss=0.09978, beats_loss=0.0103, ecapa_loss=0.0001365, whisper_loss=0.08812, over 3365598.41 frames. ], batch size: 58, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:45:13,737 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 27 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-21 01:45:14,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5042350.0, ans=0.1 2024-08-21 01:45:34,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=5042450.0, ans=0.0 2024-08-21 01:45:37,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5042450.0, ans=0.125 2024-08-21 01:45:39,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2024-08-21 01:45:39,954 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 39 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-21 01:45:43,139 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.17 vs. limit=15.0 2024-08-21 01:45:43,726 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.632e+01 2.267e+01 2.494e+01 2.807e+01 3.587e+01, threshold=4.988e+01, percent-clipped=0.0 2024-08-21 01:45:44,000 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 17 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-21 01:45:57,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5042550.0, ans=0.0 2024-08-21 01:46:02,938 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.14 vs. limit=12.0 2024-08-21 01:46:20,988 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 500, loss[loss=0.09281, beats_loss=0.01013, ecapa_loss=0.000174, whisper_loss=0.08093, over 21402.00 frames. ], tot_loss[loss=0.09949, beats_loss=0.01029, ecapa_loss=0.0001367, whisper_loss=0.08783, over 3433876.40 frames. ], batch size: 92, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:46:48,831 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 28 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-21 01:46:50,534 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 24 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-21 01:46:53,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=5042850.0, ans=0.125 2024-08-21 01:46:53,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=5042850.0, ans=0.125 2024-08-21 01:47:13,998 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-21 01:47:31,396 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 11 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-21 01:47:33,719 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 01:47:44,266 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-21 01:47:45,161 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 25 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-21 01:47:50,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5043150.0, ans=0.125 2024-08-21 01:47:58,085 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 550, loss[loss=0.08994, beats_loss=0.01093, ecapa_loss=0.0001087, whisper_loss=0.07793, over 16903.00 frames. ], tot_loss[loss=0.09939, beats_loss=0.01024, ecapa_loss=0.000137, whisper_loss=0.08778, over 3463422.90 frames. ], batch size: 64, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:47:58,327 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 23 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-21 01:48:02,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5043250.0, ans=0.0 2024-08-21 01:48:17,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=5043350.0, ans=0.0 2024-08-21 01:48:30,646 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.69 vs. limit=22.5 2024-08-21 01:48:43,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=5043450.0, ans=0.125 2024-08-21 01:48:54,338 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.277e+01 2.484e+01 2.909e+01 4.062e+02, threshold=4.967e+01, percent-clipped=2.0 2024-08-21 01:49:30,489 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 600, loss[loss=0.1119, beats_loss=0.008396, ecapa_loss=0.0001525, whisper_loss=0.102, over 14074.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01013, ecapa_loss=0.000137, whisper_loss=0.08892, over 3525710.12 frames. ], batch size: 57, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:49:35,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5043750.0, ans=0.125 2024-08-21 01:50:17,700 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 26 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-21 01:50:18,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=5043950.0, ans=10.0 2024-08-21 01:50:30,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5044050.0, ans=0.125 2024-08-21 01:50:52,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5044150.0, ans=0.1 2024-08-21 01:51:00,064 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 650, loss[loss=0.1208, beats_loss=0.009311, ecapa_loss=0.0001479, whisper_loss=0.11, over 21438.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01013, ecapa_loss=0.0001368, whisper_loss=0.0893, over 3586019.92 frames. ], batch size: 82, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:51:04,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=5044250.0, ans=0.2 2024-08-21 01:51:07,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=5044250.0, ans=0.2 2024-08-21 01:51:09,323 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 17 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-21 01:51:42,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5044450.0, ans=0.0 2024-08-21 01:51:45,329 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 29 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-21 01:51:51,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5044550.0, ans=0.1 2024-08-21 01:51:51,920 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.231e+01 2.431e+01 2.762e+01 3.963e+01, threshold=4.863e+01, percent-clipped=0.0 2024-08-21 01:52:05,083 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-21 01:52:17,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5044650.0, ans=0.1 2024-08-21 01:52:27,918 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 700, loss[loss=0.1041, beats_loss=0.009223, ecapa_loss=0.0001629, whisper_loss=0.09325, over 19816.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01015, ecapa_loss=0.0001363, whisper_loss=0.08962, over 3627309.57 frames. ], batch size: 81, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:52:33,475 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 17 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-21 01:52:50,872 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 22 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-21 01:52:59,733 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 17 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-21 01:53:23,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=5045050.0, ans=0.035 2024-08-21 01:53:45,737 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.24 vs. limit=15.0 2024-08-21 01:53:48,570 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 23 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-21 01:53:56,829 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 750, loss[loss=0.1056, beats_loss=0.01009, ecapa_loss=0.0001258, whisper_loss=0.09428, over 20790.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01014, ecapa_loss=0.0001358, whisper_loss=0.0891, over 3645736.93 frames. ], batch size: 83, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:54:08,693 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2024-08-21 01:54:08,822 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.50 vs. limit=6.0 2024-08-21 01:54:11,062 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 15 from LS+wenet, 24 from Vox, 17 fro AS 2024-08-21 01:54:40,920 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 27 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-21 01:54:49,503 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.245e+01 2.497e+01 2.753e+01 9.624e+01, threshold=4.995e+01, percent-clipped=1.0 2024-08-21 01:54:54,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5045550.0, ans=0.125 2024-08-21 01:55:06,119 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 19 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-21 01:55:08,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=5045650.0, ans=0.0 2024-08-21 01:55:25,997 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 800, loss[loss=0.1034, beats_loss=0.009595, ecapa_loss=0.000138, whisper_loss=0.09243, over 23164.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01014, ecapa_loss=0.0001369, whisper_loss=0.0888, over 3681791.65 frames. ], batch size: 91, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:55:32,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=5045750.0, ans=0.125 2024-08-21 01:55:37,104 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 13 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-21 01:56:00,838 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 27 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-21 01:56:18,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5046050.0, ans=0.125 2024-08-21 01:56:31,551 WARNING [optim.py:496] (3/4) Scaling gradients by 0.057700227946043015, model_norm_threshold=49.945823669433594 2024-08-21 01:56:31,719 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.361e+05, grad_sumsq=1.361e+05, orig_rms_sq=1.000e+00 2024-08-21 01:56:39,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=5046150.0, ans=0.0 2024-08-21 01:56:45,908 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 17 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-21 01:56:53,841 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 850, loss[loss=0.08991, beats_loss=0.009074, ecapa_loss=0.0001533, whisper_loss=0.0793, over 13544.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01015, ecapa_loss=0.0001374, whisper_loss=0.08849, over 3671094.30 frames. ], batch size: 58, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:57:03,422 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 20 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-21 01:57:16,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5046350.0, ans=0.125 2024-08-21 01:57:28,048 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 25 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-21 01:57:48,502 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.265e+01 2.514e+01 2.854e+01 8.656e+02, threshold=5.028e+01, percent-clipped=3.0 2024-08-21 01:57:51,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5046550.0, ans=0.1 2024-08-21 01:57:58,255 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-21 01:58:05,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=5046650.0, ans=0.025 2024-08-21 01:58:15,985 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 25 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-21 01:58:19,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=5046650.0, ans=0.0 2024-08-21 01:58:25,202 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 900, loss[loss=0.1042, beats_loss=0.009439, ecapa_loss=0.0001534, whisper_loss=0.09324, over 14004.00 frames. ], tot_loss[loss=0.09954, beats_loss=0.01015, ecapa_loss=0.0001364, whisper_loss=0.08803, over 3652869.76 frames. ], batch size: 56, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:58:33,956 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.91 vs. limit=15.0 2024-08-21 01:58:37,906 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 23 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-21 01:58:53,232 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.07 vs. limit=10.0 2024-08-21 01:59:05,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5046950.0, ans=0.1 2024-08-21 01:59:06,920 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 25 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-21 01:59:07,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=5046950.0, ans=0.125 2024-08-21 01:59:22,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=5047050.0, ans=0.125 2024-08-21 01:59:28,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=5047050.0, ans=0.125 2024-08-21 01:59:35,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=5047050.0, ans=0.0 2024-08-21 01:59:55,750 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 950, loss[loss=0.09189, beats_loss=0.01105, ecapa_loss=0.0001591, whisper_loss=0.07925, over 20676.00 frames. ], tot_loss[loss=0.09948, beats_loss=0.01019, ecapa_loss=0.0001361, whisper_loss=0.08792, over 3666244.20 frames. ], batch size: 87, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 02:00:04,632 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 21 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-21 02:00:13,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5047350.0, ans=0.125 2024-08-21 02:00:31,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5047450.0, ans=0.1 2024-08-21 02:00:41,219 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 20 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-21 02:00:43,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=5047450.0, ans=0.2 2024-08-21 02:00:48,156 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.565e+01 2.167e+01 2.360e+01 2.609e+01 1.184e+02, threshold=4.721e+01, percent-clipped=1.0 2024-08-21 02:00:57,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=5047550.0, ans=0.2 2024-08-21 02:01:04,034 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 24 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-21 02:01:07,605 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 24 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-21 02:01:11,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=5047650.0, ans=0.125 2024-08-21 02:01:23,454 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 1000, loss[loss=0.1022, beats_loss=0.009789, ecapa_loss=0.0001596, whisper_loss=0.09079, over 16142.00 frames. ], tot_loss[loss=0.09936, beats_loss=0.01023, ecapa_loss=0.0001376, whisper_loss=0.08776, over 3663628.36 frames. ], batch size: 64, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 02:01:52,565 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 21 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-21 02:01:53,796 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.11 vs. limit=12.0 2024-08-21 02:01:55,550 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.03 vs. limit=15.0 2024-08-21 02:02:02,955 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.81 vs. limit=6.0 2024-08-21 02:02:17,540 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 25 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-21 02:02:32,056 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-21 02:02:44,794 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=15.0 2024-08-21 02:02:46,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=5048150.0, ans=0.125 2024-08-21 02:02:46,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=5048150.0, ans=0.125 2024-08-21 02:02:53,584 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 1050, loss[loss=0.07274, beats_loss=0.01235, ecapa_loss=0.0001385, whisper_loss=0.059, over 21047.00 frames. ], tot_loss[loss=0.09949, beats_loss=0.01024, ecapa_loss=0.0001378, whisper_loss=0.08787, over 3684588.87 frames. ], batch size: 84, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 02:03:21,227 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 02:03:32,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=5048450.0, ans=0.125 2024-08-21 02:03:40,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=5048450.0, ans=0.125 2024-08-21 02:03:48,654 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-21 02:03:50,214 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.332e+01 2.561e+01 2.821e+01 8.058e+01, threshold=5.122e+01, percent-clipped=2.0 2024-08-21 02:03:57,875 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.66 vs. limit=22.5 2024-08-21 02:04:27,062 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 1100, loss[loss=0.08927, beats_loss=0.009381, ecapa_loss=0.0001803, whisper_loss=0.07809, over 14381.00 frames. ], tot_loss[loss=0.09993, beats_loss=0.0102, ecapa_loss=0.0001369, whisper_loss=0.08836, over 3678380.71 frames. ], batch size: 61, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:04:33,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=5048750.0, ans=0.0 2024-08-21 02:05:16,454 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 22 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-21 02:05:20,924 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2024-08-21 02:05:53,885 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-21 02:05:58,280 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 1150, loss[loss=0.1089, beats_loss=0.008013, ecapa_loss=0.0001486, whisper_loss=0.09935, over 16190.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01018, ecapa_loss=0.0001364, whisper_loss=0.08911, over 3713071.11 frames. ], batch size: 64, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:06:22,934 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 18 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-21 02:06:27,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=5049350.0, ans=0.2 2024-08-21 02:06:30,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5049350.0, ans=0.125 2024-08-21 02:06:39,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5049450.0, ans=0.125 2024-08-21 02:06:48,871 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 22 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-21 02:06:50,379 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.348e+01 2.579e+01 2.822e+01 4.118e+01, threshold=5.158e+01, percent-clipped=0.0 2024-08-21 02:06:56,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5049550.0, ans=0.1 2024-08-21 02:07:25,362 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 1200, loss[loss=0.1199, beats_loss=0.008669, ecapa_loss=0.0001438, whisper_loss=0.1098, over 15257.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01021, ecapa_loss=0.0001363, whisper_loss=0.08888, over 3702895.43 frames. ], batch size: 59, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:07:39,132 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-21 02:08:05,940 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 14 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-21 02:08:08,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=5049950.0, ans=0.125 2024-08-21 02:08:12,347 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 26 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-21 02:08:12,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=5049950.0, ans=10.0 2024-08-21 02:08:20,589 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.90 vs. limit=15.0 2024-08-21 02:08:21,884 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.42 vs. limit=15.0 2024-08-21 02:08:52,682 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 1250, loss[loss=0.08067, beats_loss=0.01009, ecapa_loss=0.0001324, whisper_loss=0.06926, over 14614.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01023, ecapa_loss=0.0001368, whisper_loss=0.08902, over 3728639.73 frames. ], batch size: 59, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:08:59,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=5050250.0, ans=15.0 2024-08-21 02:09:00,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=5050250.0, ans=0.0 2024-08-21 02:09:09,493 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 19 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-21 02:09:28,459 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 02:09:29,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=5050450.0, ans=0.95 2024-08-21 02:09:38,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5050450.0, ans=0.125 2024-08-21 02:09:46,789 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.192e+01 2.364e+01 2.564e+01 4.097e+01, threshold=4.729e+01, percent-clipped=0.0 2024-08-21 02:09:51,671 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-21 02:09:57,987 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.53 vs. limit=22.5 2024-08-21 02:09:58,950 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 13 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-21 02:10:00,704 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 33 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-21 02:10:04,894 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.11 vs. limit=10.0 2024-08-21 02:10:07,731 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 34 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-21 02:10:07,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=5050650.0, ans=0.0 2024-08-21 02:10:07,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=5050650.0, ans=0.0 2024-08-21 02:10:23,330 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 1300, loss[loss=0.07559, beats_loss=0.01108, ecapa_loss=0.0001262, whisper_loss=0.06325, over 14060.00 frames. ], tot_loss[loss=0.09992, beats_loss=0.01026, ecapa_loss=0.0001371, whisper_loss=0.08829, over 3712279.68 frames. ], batch size: 52, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:10:59,482 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 28 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-21 02:11:04,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=5050950.0, ans=0.125 2024-08-21 02:11:08,115 WARNING [optim.py:496] (3/4) Scaling gradients by 0.01577102579176426, model_norm_threshold=47.28926467895508 2024-08-21 02:11:08,284 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.32, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.852e+06, grad_sumsq=2.852e+06, orig_rms_sq=1.000e+00 2024-08-21 02:11:10,774 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 02:11:21,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=5051050.0, ans=0.125 2024-08-21 02:11:25,504 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 29 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-21 02:11:33,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=5051150.0, ans=0.125 2024-08-21 02:11:37,380 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.64 vs. limit=15.0 2024-08-21 02:11:45,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=5051150.0, ans=0.125 2024-08-21 02:11:51,610 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.38 vs. limit=15.0 2024-08-21 02:11:52,006 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 1350, loss[loss=0.1075, beats_loss=0.009621, ecapa_loss=0.0001151, whisper_loss=0.09675, over 15478.00 frames. ], tot_loss[loss=0.09955, beats_loss=0.0105, ecapa_loss=0.0001357, whisper_loss=0.08769, over 3718382.10 frames. ], batch size: 58, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:12:07,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5051250.0, ans=0.1 2024-08-21 02:12:17,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5051350.0, ans=0.0 2024-08-21 02:12:31,995 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 20 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-21 02:12:39,416 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 27 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-21 02:12:47,031 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.723e+01 2.226e+01 2.529e+01 2.867e+01 2.998e+03, threshold=5.057e+01, percent-clipped=1.0 2024-08-21 02:13:10,204 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 38 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-21 02:13:23,048 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 1400, loss[loss=0.09119, beats_loss=0.01106, ecapa_loss=0.0001577, whisper_loss=0.07855, over 21930.00 frames. ], tot_loss[loss=0.09977, beats_loss=0.01047, ecapa_loss=0.0001356, whisper_loss=0.08794, over 3755711.04 frames. ], batch size: 91, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:13:33,856 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 7 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-21 02:13:38,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.64 vs. limit=15.0 2024-08-21 02:13:46,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=5051850.0, ans=0.0 2024-08-21 02:14:11,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5051950.0, ans=0.125 2024-08-21 02:14:17,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5051950.0, ans=0.125 2024-08-21 02:14:19,989 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 19 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-21 02:14:31,553 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.73 vs. limit=22.5 2024-08-21 02:14:54,686 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 17 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-21 02:14:55,677 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 1450, loss[loss=0.08513, beats_loss=0.01145, ecapa_loss=0.0001166, whisper_loss=0.07252, over 16888.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01043, ecapa_loss=0.0001358, whisper_loss=0.08828, over 3746831.57 frames. ], batch size: 66, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:14:57,792 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 15 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-21 02:15:37,695 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.13 vs. limit=15.0 2024-08-21 02:15:38,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=5052450.0, ans=0.025 2024-08-21 02:15:46,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5052450.0, ans=0.125 2024-08-21 02:15:49,352 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.283e+01 2.598e+01 2.871e+01 4.818e+01, threshold=5.197e+01, percent-clipped=0.0 2024-08-21 02:16:20,175 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2024-08-21 02:16:34,712 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 24 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-21 02:16:42,135 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 1500, loss[loss=0.1047, beats_loss=0.008722, ecapa_loss=0.0001399, whisper_loss=0.0946, over 21541.00 frames. ], tot_loss[loss=0.09967, beats_loss=0.01043, ecapa_loss=0.0001348, whisper_loss=0.0879, over 3732068.63 frames. ], batch size: 85, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:16:43,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=5052750.0, ans=0.2 2024-08-21 02:16:53,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5052750.0, ans=0.1 2024-08-21 02:16:56,290 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 24 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-21 02:17:17,102 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=15.0 2024-08-21 02:17:47,918 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.98 vs. limit=15.0 2024-08-21 02:18:16,708 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 1550, loss[loss=0.1143, beats_loss=0.009567, ecapa_loss=0.0001435, whisper_loss=0.1033, over 20132.00 frames. ], tot_loss[loss=0.09938, beats_loss=0.01038, ecapa_loss=0.0001362, whisper_loss=0.08763, over 3709410.90 frames. ], batch size: 81, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:18:19,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=5053250.0, ans=0.125 2024-08-21 02:18:25,079 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 16 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-21 02:18:31,123 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.21 vs. limit=15.0 2024-08-21 02:18:35,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=5053350.0, ans=0.2 2024-08-21 02:18:37,401 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 18 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-21 02:18:53,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5053450.0, ans=0.125 2024-08-21 02:18:59,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=5053450.0, ans=0.2 2024-08-21 02:19:01,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=5053450.0, ans=0.2 2024-08-21 02:19:12,481 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-21 02:19:13,375 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.638e+01 2.172e+01 2.388e+01 2.761e+01 1.037e+02, threshold=4.777e+01, percent-clipped=1.0 2024-08-21 02:19:36,733 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 35 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-21 02:19:50,435 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 1600, loss[loss=0.07809, beats_loss=0.01107, ecapa_loss=0.0001451, whisper_loss=0.06556, over 22335.00 frames. ], tot_loss[loss=0.09971, beats_loss=0.01038, ecapa_loss=0.0001349, whisper_loss=0.08798, over 3739682.49 frames. ], batch size: 91, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:19:54,937 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.22 vs. limit=22.5 2024-08-21 02:19:55,897 INFO [train_multi_KD3.py:845] (3/4) A total of 49 cuts. 16 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-21 02:20:05,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5053750.0, ans=0.125 2024-08-21 02:20:24,992 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 31 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-21 02:20:38,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5053950.0, ans=0.125 2024-08-21 02:20:52,934 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.67 vs. limit=22.5 2024-08-21 02:21:01,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=5054150.0, ans=0.04949747468305833 2024-08-21 02:21:08,629 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 18 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-21 02:21:10,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=5054150.0, ans=0.0 2024-08-21 02:21:19,803 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 13 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-21 02:21:19,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5054250.0, ans=0.125 2024-08-21 02:21:20,729 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 1650, loss[loss=0.07303, beats_loss=0.01072, ecapa_loss=0.0001355, whisper_loss=0.06095, over 15639.00 frames. ], tot_loss[loss=0.09935, beats_loss=0.0103, ecapa_loss=0.0001347, whisper_loss=0.0877, over 3752071.14 frames. ], batch size: 63, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:21:39,265 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-21 02:22:14,944 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.709e+01 2.289e+01 2.512e+01 2.829e+01 4.001e+01, threshold=5.024e+01, percent-clipped=0.0 2024-08-21 02:22:15,487 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 30 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-21 02:22:23,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=5054550.0, ans=0.0 2024-08-21 02:22:26,458 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-21 02:22:42,847 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 17 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-21 02:22:51,665 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 1700, loss[loss=0.1146, beats_loss=0.01043, ecapa_loss=0.0001303, whisper_loss=0.1029, over 21962.00 frames. ], tot_loss[loss=0.09963, beats_loss=0.01021, ecapa_loss=0.0001357, whisper_loss=0.08806, over 3766151.76 frames. ], batch size: 87, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:22:56,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=5054750.0, ans=0.125 2024-08-21 02:23:17,320 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=12.47 vs. limit=12.0 2024-08-21 02:23:24,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=5054850.0, ans=0.5 2024-08-21 02:23:27,492 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 14 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-21 02:23:54,558 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 23 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-21 02:24:10,540 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.14 vs. limit=22.5 2024-08-21 02:24:19,049 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 23 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-21 02:24:23,803 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 1750, loss[loss=0.1125, beats_loss=0.007733, ecapa_loss=0.0001707, whisper_loss=0.1031, over 15165.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01017, ecapa_loss=0.0001352, whisper_loss=0.08884, over 3738959.21 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:24:30,324 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 22 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-21 02:24:30,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=5055250.0, ans=0.07 2024-08-21 02:24:30,964 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.95 vs. limit=12.0 2024-08-21 02:24:42,481 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=15.0 2024-08-21 02:24:55,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5055350.0, ans=0.125 2024-08-21 02:24:56,979 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.20 vs. limit=15.0 2024-08-21 02:24:59,835 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 28 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-21 02:25:03,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5055450.0, ans=0.0 2024-08-21 02:25:19,027 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.242e+01 2.435e+01 2.822e+01 2.727e+02, threshold=4.871e+01, percent-clipped=1.0 2024-08-21 02:25:28,124 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 19 from LS+wenet, 18 from Vox, 16 fro AS 2024-08-21 02:25:40,941 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 16 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-21 02:25:55,138 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 1800, loss[loss=0.1062, beats_loss=0.009624, ecapa_loss=0.0001395, whisper_loss=0.09522, over 16452.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01016, ecapa_loss=0.0001344, whisper_loss=0.08943, over 3745728.02 frames. ], batch size: 63, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:25:56,841 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 15 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-21 02:26:38,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5055950.0, ans=0.125 2024-08-21 02:27:26,622 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 1850, loss[loss=0.1221, beats_loss=0.008452, ecapa_loss=0.0001336, whisper_loss=0.1123, over 20502.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01018, ecapa_loss=0.0001343, whisper_loss=0.08939, over 3764325.43 frames. ], batch size: 76, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:27:30,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=5056250.0, ans=0.0 2024-08-21 02:27:46,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=5056350.0, ans=0.0 2024-08-21 02:27:49,472 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 12 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-21 02:27:51,061 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-21 02:28:10,900 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.670e-03 2024-08-21 02:28:13,865 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 25 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-21 02:28:18,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2024-08-21 02:28:21,039 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.279e+01 2.495e+01 2.833e+01 4.581e+01, threshold=4.990e+01, percent-clipped=0.0 2024-08-21 02:28:24,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=5056550.0, ans=15.0 2024-08-21 02:28:25,359 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 19 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-21 02:28:53,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5056650.0, ans=0.125 2024-08-21 02:28:58,029 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.63 vs. limit=22.5 2024-08-21 02:28:58,297 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 1900, loss[loss=0.0998, beats_loss=0.01225, ecapa_loss=0.000112, whisper_loss=0.08643, over 22536.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01023, ecapa_loss=0.0001339, whisper_loss=0.08869, over 3698225.27 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:29:06,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5056750.0, ans=0.1 2024-08-21 02:29:26,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=5056850.0, ans=0.125 2024-08-21 02:29:35,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=5056950.0, ans=0.035 2024-08-21 02:29:42,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=5056950.0, ans=0.0 2024-08-21 02:29:54,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=5057050.0, ans=0.125 2024-08-21 02:30:07,678 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 23 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-21 02:30:17,288 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 31 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-21 02:30:24,536 INFO [train_multi_KD3.py:845] (3/4) A total of 98 cuts. 25 from LS+wenet, 22 from Vox, 51 fro AS 2024-08-21 02:30:29,923 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 1950, loss[loss=0.1251, beats_loss=0.008683, ecapa_loss=0.0001388, whisper_loss=0.115, over 18865.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01028, ecapa_loss=0.000134, whisper_loss=0.0892, over 3720474.65 frames. ], batch size: 73, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:30:40,946 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 26 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-21 02:30:42,108 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.73 vs. limit=15.0 2024-08-21 02:30:47,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5057350.0, ans=0.1 2024-08-21 02:30:55,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=5057350.0, ans=0.2 2024-08-21 02:31:25,604 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.215e+01 2.451e+01 2.695e+01 5.295e+01, threshold=4.901e+01, percent-clipped=1.0 2024-08-21 02:31:32,959 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 19 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-21 02:31:46,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5057650.0, ans=0.125 2024-08-21 02:31:48,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=5057650.0, ans=0.0 2024-08-21 02:32:02,173 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 2000, loss[loss=0.0717, beats_loss=0.01067, ecapa_loss=0.0001845, whisper_loss=0.05919, over 13564.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01026, ecapa_loss=0.0001347, whisper_loss=0.08852, over 3715352.53 frames. ], batch size: 60, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:32:08,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5057750.0, ans=0.125 2024-08-21 02:32:08,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=5057750.0, ans=0.125 2024-08-21 02:32:15,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5057750.0, ans=0.1 2024-08-21 02:32:18,845 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 32 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-21 02:32:40,821 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.85 vs. limit=15.0 2024-08-21 02:32:41,486 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 21 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-21 02:33:34,310 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 2050, loss[loss=0.09215, beats_loss=0.009984, ecapa_loss=0.0001613, whisper_loss=0.08055, over 22499.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01024, ecapa_loss=0.0001347, whisper_loss=0.08844, over 3708741.78 frames. ], batch size: 94, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:34:04,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=5058350.0, ans=22.5 2024-08-21 02:34:11,617 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.544e+05 2024-08-21 02:34:13,090 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 26 from LS+wenet, 16 from Vox, 52 fro AS 2024-08-21 02:34:27,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5058450.0, ans=0.125 2024-08-21 02:34:30,317 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.712e+01 2.290e+01 2.506e+01 2.810e+01 1.281e+02, threshold=5.013e+01, percent-clipped=3.0 2024-08-21 02:34:31,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5058550.0, ans=0.1 2024-08-21 02:34:38,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=5058550.0, ans=0.125 2024-08-21 02:34:38,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=5058550.0, ans=0.0 2024-08-21 02:34:40,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=5058550.0, ans=0.125 2024-08-21 02:34:44,818 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.22 vs. limit=22.5 2024-08-21 02:34:55,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5058650.0, ans=0.0 2024-08-21 02:35:06,272 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 2100, loss[loss=0.07711, beats_loss=0.01044, ecapa_loss=0.0001552, whisper_loss=0.06511, over 13333.00 frames. ], tot_loss[loss=0.09953, beats_loss=0.0103, ecapa_loss=0.0001344, whisper_loss=0.08788, over 3724572.04 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:35:33,264 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.12 vs. limit=10.0 2024-08-21 02:35:47,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=5058950.0, ans=0.0 2024-08-21 02:36:03,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=5059050.0, ans=0.04949747468305833 2024-08-21 02:36:12,557 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.52 vs. limit=12.0 2024-08-21 02:36:37,083 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 2150, loss[loss=0.116, beats_loss=0.008833, ecapa_loss=0.0001345, whisper_loss=0.1059, over 22059.00 frames. ], tot_loss[loss=0.09971, beats_loss=0.01035, ecapa_loss=0.0001331, whisper_loss=0.08803, over 3714087.51 frames. ], batch size: 88, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:36:40,831 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.14 vs. limit=15.0 2024-08-21 02:37:05,065 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 21 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-21 02:37:28,423 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.78 vs. limit=15.0 2024-08-21 02:37:35,259 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.212e+01 2.472e+01 2.786e+01 4.629e+01, threshold=4.943e+01, percent-clipped=0.0 2024-08-21 02:37:43,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5059550.0, ans=0.125 2024-08-21 02:37:55,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5059650.0, ans=0.0 2024-08-21 02:38:13,163 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 2200, loss[loss=0.09468, beats_loss=0.01162, ecapa_loss=9.964e-05, whisper_loss=0.08206, over 16351.00 frames. ], tot_loss[loss=0.09995, beats_loss=0.01044, ecapa_loss=0.0001324, whisper_loss=0.08818, over 3718238.30 frames. ], batch size: 61, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:38:16,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=5059750.0, ans=0.0 2024-08-21 02:38:44,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5059850.0, ans=0.125 2024-08-21 02:39:03,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=5059950.0, ans=10.0 2024-08-21 02:39:15,267 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 18 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-21 02:39:19,049 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.77 vs. limit=15.0 2024-08-21 02:39:28,244 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 14 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-21 02:39:33,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5060150.0, ans=0.0 2024-08-21 02:39:34,901 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 19 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-21 02:39:44,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=5060250.0, ans=0.0 2024-08-21 02:39:45,174 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 2250, loss[loss=0.09419, beats_loss=0.01138, ecapa_loss=0.0001106, whisper_loss=0.08171, over 23336.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.0105, ecapa_loss=0.0001326, whisper_loss=0.08858, over 3720634.86 frames. ], batch size: 91, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:39:49,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5060250.0, ans=0.125 2024-08-21 02:39:50,099 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.01 vs. limit=10.0 2024-08-21 02:40:06,257 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 29 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-21 02:40:27,131 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 21 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-21 02:40:29,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5060450.0, ans=0.125 2024-08-21 02:40:39,321 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.650e+01 2.265e+01 2.538e+01 2.956e+01 4.238e+01, threshold=5.075e+01, percent-clipped=0.0 2024-08-21 02:41:14,830 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 2300, loss[loss=0.1001, beats_loss=0.01388, ecapa_loss=0.0001217, whisper_loss=0.08504, over 22415.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01054, ecapa_loss=0.0001332, whisper_loss=0.0889, over 3745610.78 frames. ], batch size: 92, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:41:15,670 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 26 from LS+wenet, 34 from Vox, 31 fro AS 2024-08-21 02:41:25,314 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 35 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-21 02:41:31,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5060750.0, ans=0.1 2024-08-21 02:41:31,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5060750.0, ans=0.0 2024-08-21 02:41:35,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=5060850.0, ans=0.125 2024-08-21 02:41:36,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5060850.0, ans=0.125 2024-08-21 02:41:44,463 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.06 vs. limit=6.0 2024-08-21 02:41:54,893 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 27 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-21 02:42:01,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5060950.0, ans=0.1 2024-08-21 02:42:37,662 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=15.0 2024-08-21 02:42:48,619 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 2350, loss[loss=0.09663, beats_loss=0.01276, ecapa_loss=0.0001005, whisper_loss=0.08287, over 22932.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01043, ecapa_loss=0.0001351, whisper_loss=0.09027, over 3782505.53 frames. ], batch size: 89, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:42:56,689 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 16 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-21 02:43:03,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=5061250.0, ans=0.125 2024-08-21 02:43:19,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=5061350.0, ans=0.0 2024-08-21 02:43:23,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=5061350.0, ans=0.0 2024-08-21 02:43:27,481 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 13 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-21 02:43:27,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5061350.0, ans=0.125 2024-08-21 02:43:29,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5061350.0, ans=0.125 2024-08-21 02:43:36,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5061450.0, ans=0.0 2024-08-21 02:43:36,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5061450.0, ans=0.1 2024-08-21 02:43:50,627 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.279e+01 2.504e+01 2.806e+01 9.902e+01, threshold=5.007e+01, percent-clipped=2.0 2024-08-21 02:44:15,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=5061650.0, ans=0.2 2024-08-21 02:44:24,726 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 32 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-21 02:44:28,720 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 30 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-21 02:44:31,003 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 2400, loss[loss=0.1108, beats_loss=0.009302, ecapa_loss=0.0001485, whisper_loss=0.1, over 21419.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01036, ecapa_loss=0.0001356, whisper_loss=0.09102, over 3803721.43 frames. ], batch size: 84, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:44:44,036 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 14 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-21 02:44:46,427 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 22 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-21 02:45:10,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5061850.0, ans=0.125 2024-08-21 02:45:15,031 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.86 vs. limit=15.0 2024-08-21 02:45:31,858 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-08-21 02:45:35,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5062050.0, ans=0.125 2024-08-21 02:45:37,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=5062050.0, ans=0.04949747468305833 2024-08-21 02:45:41,328 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 14 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-21 02:45:45,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5062050.0, ans=0.125 2024-08-21 02:45:58,248 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-21 02:46:23,533 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 2450, loss[loss=0.1076, beats_loss=0.0127, ecapa_loss=9.19e-05, whisper_loss=0.09397, over 15875.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0104, ecapa_loss=0.0001356, whisper_loss=0.09071, over 3801503.80 frames. ], batch size: 59, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:46:38,022 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 18 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-21 02:46:40,904 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 28 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-21 02:46:44,550 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 18 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-21 02:47:01,601 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 17 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-21 02:47:06,246 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 22 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-21 02:47:17,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=5062450.0, ans=0.125 2024-08-21 02:47:31,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5062450.0, ans=0.0 2024-08-21 02:47:32,547 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.86 vs. limit=22.5 2024-08-21 02:47:34,860 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.316e+01 2.526e+01 2.772e+01 3.117e+02, threshold=5.053e+01, percent-clipped=1.0 2024-08-21 02:48:24,288 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 2500, loss[loss=0.09485, beats_loss=0.01235, ecapa_loss=0.0001428, whisper_loss=0.08107, over 21598.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01039, ecapa_loss=0.0001346, whisper_loss=0.09063, over 3796301.19 frames. ], batch size: 89, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:48:27,360 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 23 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-21 02:48:29,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5062750.0, ans=0.125 2024-08-21 02:48:35,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=5062750.0, ans=0.125 2024-08-21 02:48:39,720 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 36 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-21 02:48:49,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=5062850.0, ans=0.04949747468305833 2024-08-21 02:48:49,549 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2024-08-21 02:49:04,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=5062850.0, ans=0.2 2024-08-21 02:49:05,633 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 32 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-21 02:49:20,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5062950.0, ans=0.0 2024-08-21 02:49:31,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5063050.0, ans=0.125 2024-08-21 02:50:05,219 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 23 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-21 02:50:09,034 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 27 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-21 02:50:13,085 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 2550, loss[loss=0.1125, beats_loss=0.01045, ecapa_loss=0.0001298, whisper_loss=0.1007, over 20370.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01038, ecapa_loss=0.0001347, whisper_loss=0.09076, over 3796916.09 frames. ], batch size: 81, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:50:28,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5063250.0, ans=0.125 2024-08-21 02:50:47,832 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.70 vs. limit=15.0 2024-08-21 02:50:48,871 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 22 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-21 02:51:05,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=5063450.0, ans=0.125 2024-08-21 02:51:20,086 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.281e+01 2.497e+01 2.831e+01 4.835e+01, threshold=4.995e+01, percent-clipped=0.0 2024-08-21 02:51:54,198 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-21 02:52:09,137 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 2600, loss[loss=0.08867, beats_loss=0.009776, ecapa_loss=9.642e-05, whisper_loss=0.07793, over 15282.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01034, ecapa_loss=0.000134, whisper_loss=0.09045, over 3800770.84 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:52:22,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=5063750.0, ans=0.0 2024-08-21 02:53:03,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5063950.0, ans=0.125 2024-08-21 02:53:04,907 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 25 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-21 02:53:07,855 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-21 02:53:09,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=5063950.0, ans=0.2 2024-08-21 02:53:25,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=5063950.0, ans=0.09899494936611666 2024-08-21 02:53:53,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5064050.0, ans=0.125 2024-08-21 02:53:58,953 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 24 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-21 02:54:11,354 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 23 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-21 02:54:21,845 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 2650, loss[loss=0.09192, beats_loss=0.009018, ecapa_loss=0.000118, whisper_loss=0.08172, over 16241.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01029, ecapa_loss=0.0001356, whisper_loss=0.09024, over 3797031.96 frames. ], batch size: 60, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:55:00,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5064350.0, ans=0.125 2024-08-21 02:55:11,682 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 15 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-21 02:55:35,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5064450.0, ans=0.1 2024-08-21 02:55:37,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5064450.0, ans=0.0 2024-08-21 02:55:41,674 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.703e+01 2.306e+01 2.544e+01 2.939e+01 3.967e+01, threshold=5.087e+01, percent-clipped=0.0 2024-08-21 02:55:43,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=5064550.0, ans=0.0 2024-08-21 02:55:43,553 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.32 vs. limit=15.0 2024-08-21 02:55:56,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.17 vs. limit=12.0 2024-08-21 02:56:32,252 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 2700, loss[loss=0.1105, beats_loss=0.00955, ecapa_loss=0.0001527, whisper_loss=0.09942, over 20966.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01026, ecapa_loss=0.0001359, whisper_loss=0.09003, over 3758990.21 frames. ], batch size: 84, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:57:20,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=5064850.0, ans=0.5 2024-08-21 02:57:31,866 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.27 vs. limit=15.0 2024-08-21 02:57:42,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.53 vs. limit=15.0 2024-08-21 02:58:26,062 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.54 vs. limit=22.5 2024-08-21 02:58:42,138 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 2750, loss[loss=0.1088, beats_loss=0.008423, ecapa_loss=0.0001383, whisper_loss=0.099, over 13575.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01036, ecapa_loss=0.0001351, whisper_loss=0.08953, over 3755330.39 frames. ], batch size: 50, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:59:02,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=5065250.0, ans=0.0 2024-08-21 02:59:26,380 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 23 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-21 02:59:43,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=5065450.0, ans=0.125 2024-08-21 02:59:59,986 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.389e+01 2.549e+01 2.769e+01 6.929e+01, threshold=5.098e+01, percent-clipped=1.0 2024-08-21 03:00:00,238 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 26 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-21 03:00:19,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5065550.0, ans=0.0 2024-08-21 03:00:20,388 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 39 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-21 03:00:37,795 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 25 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-21 03:00:45,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=5065650.0, ans=0.0 2024-08-21 03:00:48,658 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 2800, loss[loss=0.109, beats_loss=0.01099, ecapa_loss=9.517e-05, whisper_loss=0.09708, over 20954.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01038, ecapa_loss=0.0001349, whisper_loss=0.08972, over 3770350.25 frames. ], batch size: 79, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:00:50,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5065750.0, ans=0.125 2024-08-21 03:00:50,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=5065750.0, ans=0.025 2024-08-21 03:01:43,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=5065950.0, ans=0.5 2024-08-21 03:01:47,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=5065950.0, ans=0.0 2024-08-21 03:02:56,664 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 2850, loss[loss=0.1026, beats_loss=0.01272, ecapa_loss=0.0001435, whisper_loss=0.08846, over 21613.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01035, ecapa_loss=0.0001362, whisper_loss=0.08961, over 3778367.66 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:03:28,192 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 25 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-21 03:03:43,239 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.04 vs. limit=10.0 2024-08-21 03:04:02,204 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-21 03:04:14,244 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.291e+01 2.516e+01 2.868e+01 4.695e+01, threshold=5.032e+01, percent-clipped=0.0 2024-08-21 03:04:18,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5066550.0, ans=0.1 2024-08-21 03:04:39,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=5066650.0, ans=0.07 2024-08-21 03:05:07,354 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 2900, loss[loss=0.1242, beats_loss=0.008115, ecapa_loss=0.000124, whisper_loss=0.1149, over 19143.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01035, ecapa_loss=0.0001365, whisper_loss=0.08981, over 3793560.73 frames. ], batch size: 70, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:05:18,257 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 17 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-21 03:05:22,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5066750.0, ans=0.125 2024-08-21 03:05:25,413 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-21 03:05:43,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5066850.0, ans=0.1 2024-08-21 03:05:50,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5066850.0, ans=0.1 2024-08-21 03:06:04,667 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 12 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-21 03:06:09,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=5066950.0, ans=0.125 2024-08-21 03:06:25,688 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 17 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-21 03:06:35,775 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 19 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-21 03:06:59,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5067150.0, ans=0.125 2024-08-21 03:07:10,184 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 2950, loss[loss=0.08641, beats_loss=0.01256, ecapa_loss=0.0001295, whisper_loss=0.07256, over 17138.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01032, ecapa_loss=0.0001368, whisper_loss=0.08992, over 3804973.98 frames. ], batch size: 72, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:07:27,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=5067250.0, ans=0.05 2024-08-21 03:07:51,934 WARNING [optim.py:496] (3/4) Scaling gradients by 0.03678512200713158, model_norm_threshold=50.32452392578125 2024-08-21 03:07:52,101 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.620e+05, grad_sumsq=7.962e+04, orig_rms_sq=3.290e+00 2024-08-21 03:07:55,400 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.53 vs. limit=15.0 2024-08-21 03:07:56,000 INFO [train_multi_KD3.py:845] (3/4) A total of 49 cuts. 18 from LS+wenet, 12 from Vox, 19 fro AS 2024-08-21 03:08:01,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=5067450.0, ans=0.2 2024-08-21 03:08:03,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=5067450.0, ans=0.125 2024-08-21 03:08:05,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=5067450.0, ans=0.0 2024-08-21 03:08:09,036 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 25 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-21 03:08:19,735 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.280e+01 2.501e+01 2.875e+01 1.368e+03, threshold=5.003e+01, percent-clipped=1.0 2024-08-21 03:08:24,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=5067550.0, ans=0.95 2024-08-21 03:08:33,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5067550.0, ans=0.1 2024-08-21 03:08:40,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5067650.0, ans=0.1 2024-08-21 03:08:48,314 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 22 from LS+wenet, 11 from Vox, 37 fro AS 2024-08-21 03:08:59,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5067650.0, ans=0.1 2024-08-21 03:09:02,342 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 3000, loss[loss=0.1015, beats_loss=0.01174, ecapa_loss=0.0001039, whisper_loss=0.08877, over 13688.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01045, ecapa_loss=0.0001369, whisper_loss=0.08943, over 3809043.30 frames. ], batch size: 51, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:09:02,343 INFO [train_multi_KD3.py:1140] (3/4) Computing validation loss 2024-08-21 03:09:39,402 INFO [train_multi_KD3.py:1150] (3/4) Epoch 35, validation on ASR_libri: loss=0.2546, beats_loss=0, ecapa_loss=0.0005038, whisper_loss=0.2496, over 931116.00 frames. 2024-08-21 03:09:59,095 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.0329, 2.6476, 3.0021, 1.8643, 2.0375, 2.1021, 2.9611, 2.8508], device='cuda:3') 2024-08-21 03:10:01,762 INFO [train_multi_KD3.py:1150] (3/4) Epoch 35, validation on SV_voxceleb1: loss=0.003899, beats_loss=0, ecapa_loss=0.0003899, whisper_loss=0, over 944235.00 frames. 2024-08-21 03:11:41,965 INFO [train_multi_KD3.py:1150] (3/4) Epoch 35, validation on AT_audioset: loss=0.02313, beats_loss=0.02313, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-21 03:11:41,968 INFO [train_multi_KD3.py:1156] (3/4) Maximum memory allocated so far is 32606MB 2024-08-21 03:11:52,336 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 20 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-21 03:11:53,844 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 03:12:10,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=5067850.0, ans=0.05 2024-08-21 03:12:12,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5067850.0, ans=0.0 2024-08-21 03:12:17,449 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.75 vs. limit=22.5 2024-08-21 03:12:22,677 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-08-21 03:12:23,730 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 29 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-21 03:12:26,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=5067950.0, ans=0.09899494936611666 2024-08-21 03:12:33,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=5067950.0, ans=0.2 2024-08-21 03:12:41,711 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 29 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-21 03:13:03,383 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 12 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-21 03:13:12,665 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 3050, loss[loss=0.1052, beats_loss=0.01029, ecapa_loss=0.0001767, whisper_loss=0.09309, over 21824.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01048, ecapa_loss=0.0001366, whisper_loss=0.08931, over 3862481.34 frames. ], batch size: 91, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:13:16,197 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 30 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-21 03:13:17,166 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 03:13:24,297 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-21 03:13:28,809 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2024-08-21 03:13:44,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5068350.0, ans=0.125 2024-08-21 03:13:54,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5068450.0, ans=0.125 2024-08-21 03:14:00,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=5068450.0, ans=0.125 2024-08-21 03:14:08,409 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.251e+01 2.540e+01 2.788e+01 3.733e+01, threshold=5.081e+01, percent-clipped=0.0 2024-08-21 03:14:25,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5068650.0, ans=0.125 2024-08-21 03:14:27,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=5068650.0, ans=0.125 2024-08-21 03:14:44,377 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 3100, loss[loss=0.09597, beats_loss=0.009829, ecapa_loss=0.0001371, whisper_loss=0.08477, over 15448.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01048, ecapa_loss=0.0001372, whisper_loss=0.08967, over 3808912.65 frames. ], batch size: 62, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 03:14:52,828 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.34 vs. limit=10.0 2024-08-21 03:14:58,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=5068750.0, ans=0.0 2024-08-21 03:15:07,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5068850.0, ans=0.1 2024-08-21 03:15:22,491 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 27 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-21 03:15:36,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5068950.0, ans=0.125 2024-08-21 03:15:46,892 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 25 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-21 03:15:52,541 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 26 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-21 03:15:58,549 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 28 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-21 03:16:12,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5069150.0, ans=0.1 2024-08-21 03:16:13,558 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 26 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-21 03:16:17,551 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 3150, loss[loss=0.1186, beats_loss=0.008113, ecapa_loss=0.0001642, whisper_loss=0.1089, over 17831.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0105, ecapa_loss=0.0001379, whisper_loss=0.08983, over 3828817.14 frames. ], batch size: 71, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 03:16:18,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=5069250.0, ans=0.0 2024-08-21 03:16:33,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=5069250.0, ans=0.125 2024-08-21 03:16:46,173 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 24 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-21 03:17:12,316 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.040e+01 2.426e+01 2.655e+01 2.939e+01 1.391e+02, threshold=5.310e+01, percent-clipped=2.0 2024-08-21 03:17:18,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=5069550.0, ans=0.0 2024-08-21 03:17:21,313 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 27 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-21 03:17:27,068 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 29 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-21 03:17:27,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=5069550.0, ans=0.125 2024-08-21 03:17:29,984 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.16 vs. limit=15.0 2024-08-21 03:17:47,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5069750.0, ans=0.125 2024-08-21 03:17:48,329 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 3200, loss[loss=0.08159, beats_loss=0.01007, ecapa_loss=0.0001664, whisper_loss=0.06986, over 15210.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.0001387, whisper_loss=0.09001, over 3841267.15 frames. ], batch size: 64, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 03:17:56,701 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 31 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-21 03:18:08,187 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.03 vs. limit=22.5 2024-08-21 03:18:13,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5069850.0, ans=0.1 2024-08-21 03:18:15,080 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.657e+00 2024-08-21 03:18:27,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5069950.0, ans=0.125 2024-08-21 03:18:32,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=5069950.0, ans=0.0 2024-08-21 03:18:43,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5070050.0, ans=0.0 2024-08-21 03:18:56,700 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-21 03:19:04,883 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 29 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-21 03:19:07,421 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.708e+00 2024-08-21 03:19:19,349 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 3250, loss[loss=0.1138, beats_loss=0.008819, ecapa_loss=0.0001575, whisper_loss=0.1034, over 22676.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01048, ecapa_loss=0.000139, whisper_loss=0.09039, over 3832230.63 frames. ], batch size: 89, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:19:40,116 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.80 vs. limit=15.0 2024-08-21 03:19:43,741 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 03:20:01,400 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.667e+00 2024-08-21 03:20:01,740 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2024-08-21 03:20:03,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=5070450.0, ans=0.07 2024-08-21 03:20:10,258 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 28 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-21 03:20:19,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5070450.0, ans=0.125 2024-08-21 03:20:24,122 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.311e+01 2.569e+01 2.814e+01 1.085e+02, threshold=5.138e+01, percent-clipped=2.0 2024-08-21 03:20:35,193 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-21 03:20:45,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=5070650.0, ans=0.0 2024-08-21 03:20:52,564 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.32 vs. limit=15.0 2024-08-21 03:21:06,346 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 3300, loss[loss=0.07147, beats_loss=0.009813, ecapa_loss=0.0001406, whisper_loss=0.06025, over 13212.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01049, ecapa_loss=0.0001394, whisper_loss=0.0897, over 3813181.04 frames. ], batch size: 52, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:21:15,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5070750.0, ans=0.0 2024-08-21 03:21:25,481 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 21 from LS+wenet, 19 from Vox, 16 fro AS 2024-08-21 03:21:29,971 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 20 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-21 03:21:52,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=5070950.0, ans=0.0 2024-08-21 03:21:55,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5070950.0, ans=0.1 2024-08-21 03:22:00,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=5070950.0, ans=0.125 2024-08-21 03:22:03,829 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 16 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-21 03:22:29,916 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 16 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-21 03:22:58,851 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 3350, loss[loss=0.1014, beats_loss=0.01134, ecapa_loss=0.000138, whisper_loss=0.08872, over 23279.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01056, ecapa_loss=0.0001394, whisper_loss=0.08918, over 3802471.07 frames. ], batch size: 92, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:23:26,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=5071350.0, ans=10.0 2024-08-21 03:23:28,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5071350.0, ans=0.125 2024-08-21 03:24:06,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5071550.0, ans=0.0 2024-08-21 03:24:07,181 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-21 03:24:09,818 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.266e+01 2.448e+01 2.718e+01 4.054e+01, threshold=4.896e+01, percent-clipped=0.0 2024-08-21 03:24:11,826 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.29 vs. limit=10.0 2024-08-21 03:24:22,730 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-21 03:24:56,846 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 3400, loss[loss=0.118, beats_loss=0.008587, ecapa_loss=0.0001533, whisper_loss=0.1079, over 23296.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01055, ecapa_loss=0.0001385, whisper_loss=0.08912, over 3827274.24 frames. ], batch size: 92, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:24:58,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5071750.0, ans=0.1 2024-08-21 03:25:22,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=5071850.0, ans=0.125 2024-08-21 03:25:23,855 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 24 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-21 03:25:42,227 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 34 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-21 03:25:47,515 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-21 03:26:21,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5072050.0, ans=0.125 2024-08-21 03:26:21,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5072050.0, ans=0.1 2024-08-21 03:26:39,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5072150.0, ans=0.125 2024-08-21 03:26:50,033 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 23 from LS+wenet, 34 from Vox, 31 fro AS 2024-08-21 03:26:51,373 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.15 vs. limit=22.5 2024-08-21 03:26:56,885 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.03 vs. limit=15.0 2024-08-21 03:26:57,265 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 3450, loss[loss=0.09914, beats_loss=0.01084, ecapa_loss=0.000142, whisper_loss=0.08687, over 21725.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01042, ecapa_loss=0.0001394, whisper_loss=0.08969, over 3818304.32 frames. ], batch size: 88, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:27:35,053 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 03:27:39,568 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2024-08-21 03:28:09,578 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.313e+01 2.518e+01 2.811e+01 5.199e+01, threshold=5.037e+01, percent-clipped=1.0 2024-08-21 03:28:14,108 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=15.0 2024-08-21 03:28:34,796 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 19 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-21 03:28:52,418 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 3500, loss[loss=0.1296, beats_loss=0.008472, ecapa_loss=0.0001297, whisper_loss=0.1199, over 20562.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01037, ecapa_loss=0.0001384, whisper_loss=0.0901, over 3827855.27 frames. ], batch size: 77, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:29:16,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=5072850.0, ans=0.0 2024-08-21 03:29:19,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=5072850.0, ans=0.0 2024-08-21 03:29:36,018 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.52 vs. limit=10.0 2024-08-21 03:29:44,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5072950.0, ans=0.125 2024-08-21 03:29:46,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=5072950.0, ans=0.5 2024-08-21 03:30:01,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=5073050.0, ans=0.025 2024-08-21 03:30:08,925 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 18 from LS+wenet, 33 from Vox, 35 fro AS 2024-08-21 03:30:44,519 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 3550, loss[loss=0.1278, beats_loss=0.008232, ecapa_loss=0.0001291, whisper_loss=0.1183, over 19400.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01037, ecapa_loss=0.000139, whisper_loss=0.09002, over 3808954.89 frames. ], batch size: 73, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:30:57,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=5073250.0, ans=0.04949747468305833 2024-08-21 03:31:23,492 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 26 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-21 03:31:27,748 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 17 from LS+wenet, 20 from Vox, 56 fro AS 2024-08-21 03:31:38,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5073450.0, ans=0.125 2024-08-21 03:31:42,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=5073450.0, ans=0.0 2024-08-21 03:31:46,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=5073450.0, ans=0.0 2024-08-21 03:31:52,537 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.266e+01 2.535e+01 2.803e+01 1.045e+02, threshold=5.070e+01, percent-clipped=1.0 2024-08-21 03:31:55,844 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 32 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-21 03:31:57,494 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-08-21 03:32:04,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5073550.0, ans=0.0 2024-08-21 03:32:38,822 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 3600, loss[loss=0.08382, beats_loss=0.01006, ecapa_loss=0.0001861, whisper_loss=0.0719, over 12025.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01035, ecapa_loss=0.0001394, whisper_loss=0.08983, over 3769393.08 frames. ], batch size: 52, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:33:31,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5073950.0, ans=0.125 2024-08-21 03:33:42,767 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 20 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-21 03:33:44,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=5073950.0, ans=0.2 2024-08-21 03:33:56,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5074050.0, ans=0.125 2024-08-21 03:34:02,949 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.92 vs. limit=15.0 2024-08-21 03:34:35,478 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 3650, loss[loss=0.09912, beats_loss=0.007892, ecapa_loss=0.0001524, whisper_loss=0.08971, over 14868.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01031, ecapa_loss=0.0001389, whisper_loss=0.09055, over 3774606.06 frames. ], batch size: 57, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:34:40,847 INFO [train_multi_KD3.py:845] (3/4) A total of 96 cuts. 18 from LS+wenet, 35 from Vox, 43 fro AS 2024-08-21 03:34:42,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5074250.0, ans=0.125 2024-08-21 03:34:43,064 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.98 vs. limit=15.0 2024-08-21 03:34:44,859 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.66 vs. limit=15.0 2024-08-21 03:34:54,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=5074250.0, ans=0.02 2024-08-21 03:35:19,164 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 15 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-21 03:35:24,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5074450.0, ans=0.125 2024-08-21 03:35:38,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5074450.0, ans=0.0 2024-08-21 03:35:48,663 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.270e+01 2.492e+01 2.659e+01 4.040e+01, threshold=4.984e+01, percent-clipped=0.0 2024-08-21 03:36:23,727 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.11 vs. limit=12.0 2024-08-21 03:36:32,047 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 32 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-21 03:36:34,372 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 3700, loss[loss=0.1089, beats_loss=0.01039, ecapa_loss=0.0001506, whisper_loss=0.09701, over 22753.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01036, ecapa_loss=0.0001396, whisper_loss=0.08946, over 3772241.44 frames. ], batch size: 92, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:36:42,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=5074750.0, ans=0.125 2024-08-21 03:36:52,392 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 25 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-21 03:37:33,206 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 20 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-21 03:38:11,687 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 21 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-21 03:38:25,466 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 38 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-21 03:38:34,078 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 3750, loss[loss=0.1071, beats_loss=0.007934, ecapa_loss=0.000148, whisper_loss=0.09767, over 14228.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01035, ecapa_loss=0.00014, whisper_loss=0.08971, over 3778116.99 frames. ], batch size: 54, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:38:36,781 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 25 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-21 03:38:41,759 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 22 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-21 03:38:53,624 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.34 vs. limit=10.0 2024-08-21 03:38:59,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5075350.0, ans=0.125 2024-08-21 03:39:05,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5075350.0, ans=0.1 2024-08-21 03:39:07,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5075350.0, ans=0.125 2024-08-21 03:39:50,456 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.197e+01 2.452e+01 2.774e+01 3.553e+01, threshold=4.904e+01, percent-clipped=0.0 2024-08-21 03:40:27,933 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 21 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-21 03:40:35,824 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 3800, loss[loss=0.1039, beats_loss=0.00581, ecapa_loss=0.0001199, whisper_loss=0.09688, over 15159.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01032, ecapa_loss=0.0001396, whisper_loss=0.08944, over 3760567.83 frames. ], batch size: 50, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:40:36,069 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-21 03:40:52,849 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.50 vs. limit=15.0 2024-08-21 03:41:22,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5075950.0, ans=0.125 2024-08-21 03:41:27,646 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.82 vs. limit=22.5 2024-08-21 03:41:30,888 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=15.0 2024-08-21 03:41:45,763 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.33 vs. limit=15.0 2024-08-21 03:42:08,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5076050.0, ans=0.125 2024-08-21 03:42:17,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=5076150.0, ans=0.5 2024-08-21 03:42:22,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5076150.0, ans=0.125 2024-08-21 03:42:25,395 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 25 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-21 03:42:38,127 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 3850, loss[loss=0.07816, beats_loss=0.01203, ecapa_loss=0.0001317, whisper_loss=0.06481, over 22448.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01038, ecapa_loss=0.0001414, whisper_loss=0.08917, over 3774508.79 frames. ], batch size: 94, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:43:18,849 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 21 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-21 03:43:24,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5076450.0, ans=0.125 2024-08-21 03:43:39,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=5076450.0, ans=0.0 2024-08-21 03:43:50,945 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.237e+01 2.453e+01 2.696e+01 3.570e+01, threshold=4.906e+01, percent-clipped=0.0 2024-08-21 03:44:10,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5076550.0, ans=0.125 2024-08-21 03:44:23,171 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 16 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-21 03:44:25,508 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 21 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-21 03:44:27,572 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 17 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-21 03:44:31,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=5076650.0, ans=0.2 2024-08-21 03:44:38,213 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 3900, loss[loss=0.1073, beats_loss=0.01022, ecapa_loss=0.0001668, whisper_loss=0.09537, over 21527.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01041, ecapa_loss=0.0001411, whisper_loss=0.08879, over 3792822.53 frames. ], batch size: 92, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:45:27,186 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 7 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-21 03:45:31,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5076950.0, ans=0.0 2024-08-21 03:45:43,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=5076950.0, ans=0.09899494936611666 2024-08-21 03:45:47,015 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 19 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-21 03:45:50,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=5077050.0, ans=0.125 2024-08-21 03:46:07,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=5077050.0, ans=0.125 2024-08-21 03:46:24,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=5077150.0, ans=0.0 2024-08-21 03:46:39,107 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 3950, loss[loss=0.09923, beats_loss=0.0101, ecapa_loss=0.0001141, whisper_loss=0.08799, over 13862.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01046, ecapa_loss=0.0001405, whisper_loss=0.08863, over 3785977.15 frames. ], batch size: 51, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:46:54,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=5077250.0, ans=0.09899494936611666 2024-08-21 03:46:57,669 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=15.0 2024-08-21 03:47:38,426 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.90 vs. limit=22.5 2024-08-21 03:47:51,689 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.265e+01 2.547e+01 2.985e+01 6.857e+01, threshold=5.095e+01, percent-clipped=1.0 2024-08-21 03:48:02,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=5077550.0, ans=0.125 2024-08-21 03:48:09,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=5077550.0, ans=0.05 2024-08-21 03:48:39,328 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 4000, loss[loss=0.0936, beats_loss=0.009764, ecapa_loss=0.0001885, whisper_loss=0.08195, over 21251.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01043, ecapa_loss=0.0001406, whisper_loss=0.08927, over 3798547.35 frames. ], batch size: 86, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:48:45,499 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 03:48:59,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5077750.0, ans=0.125 2024-08-21 03:49:17,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=5077850.0, ans=0.5 2024-08-21 03:49:17,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=5077850.0, ans=0.2 2024-08-21 03:49:33,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=5077950.0, ans=0.0 2024-08-21 03:49:51,961 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 28 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-21 03:50:29,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=5078150.0, ans=0.0 2024-08-21 03:50:48,819 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 4050, loss[loss=0.113, beats_loss=0.009757, ecapa_loss=0.0001336, whisper_loss=0.102, over 22400.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01044, ecapa_loss=0.0001401, whisper_loss=0.08979, over 3842479.67 frames. ], batch size: 89, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:51:45,021 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.86 vs. limit=15.0 2024-08-21 03:51:47,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=5078450.0, ans=0.0 2024-08-21 03:52:10,508 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.320e+01 2.567e+01 2.893e+01 7.952e+01, threshold=5.134e+01, percent-clipped=3.0 2024-08-21 03:52:40,743 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.39 vs. limit=22.5 2024-08-21 03:52:48,361 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 37 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-21 03:52:50,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=5078650.0, ans=0.1 2024-08-21 03:53:01,436 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 4100, loss[loss=0.1027, beats_loss=0.01124, ecapa_loss=0.0001043, whisper_loss=0.09045, over 23088.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01037, ecapa_loss=0.0001399, whisper_loss=0.09066, over 3850053.64 frames. ], batch size: 89, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:53:10,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5078750.0, ans=0.1 2024-08-21 03:53:16,244 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0443761944770813, model_norm_threshold=51.335693359375 2024-08-21 03:53:16,411 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.724e+05, grad_sumsq=2.524e+07, orig_rms_sq=1.079e-02 2024-08-21 03:53:16,687 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 30 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-21 03:53:22,732 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.95 vs. limit=22.5 2024-08-21 03:54:20,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5079050.0, ans=0.125 2024-08-21 03:54:44,540 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-21 03:55:07,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5079150.0, ans=0.125 2024-08-21 03:55:10,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5079250.0, ans=0.1 2024-08-21 03:55:10,839 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 4150, loss[loss=0.09551, beats_loss=0.01192, ecapa_loss=0.0001118, whisper_loss=0.08246, over 20404.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01046, ecapa_loss=0.0001402, whisper_loss=0.09003, over 3848313.47 frames. ], batch size: 81, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:55:20,462 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=12.0 2024-08-21 03:55:28,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=5079250.0, ans=0.5 2024-08-21 03:55:36,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=5079350.0, ans=0.2 2024-08-21 03:55:45,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=5079350.0, ans=0.0 2024-08-21 03:55:46,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=5079350.0, ans=0.125 2024-08-21 03:56:14,948 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-21 03:56:26,489 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.20 vs. limit=15.0 2024-08-21 03:56:28,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=5079550.0, ans=0.125 2024-08-21 03:56:32,408 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.340e+01 2.592e+01 2.879e+01 1.157e+03, threshold=5.184e+01, percent-clipped=4.0 2024-08-21 03:56:35,719 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 28 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-21 03:56:38,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=5079550.0, ans=0.0 2024-08-21 03:56:40,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5079550.0, ans=0.1 2024-08-21 03:57:04,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=5079650.0, ans=0.07 2024-08-21 03:57:17,870 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 4200, loss[loss=0.09838, beats_loss=0.008973, ecapa_loss=0.0001567, whisper_loss=0.08784, over 22556.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01041, ecapa_loss=0.0001406, whisper_loss=0.0907, over 3843341.86 frames. ], batch size: 92, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:57:22,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=5079750.0, ans=0.0 2024-08-21 03:57:30,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5079750.0, ans=0.125 2024-08-21 03:57:40,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=5079850.0, ans=0.125 2024-08-21 03:58:09,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=5079950.0, ans=0.0 2024-08-21 03:58:50,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=5080050.0, ans=0.09899494936611666 2024-08-21 03:59:20,461 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 4250, loss[loss=0.09721, beats_loss=0.0127, ecapa_loss=0.0001487, whisper_loss=0.08303, over 14473.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01045, ecapa_loss=0.0001399, whisper_loss=0.09109, over 3864770.94 frames. ], batch size: 60, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:59:26,341 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2024-08-21 03:59:33,939 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 37 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-21 04:00:10,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5080450.0, ans=0.0 2024-08-21 04:00:40,591 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.302e+01 2.518e+01 2.832e+01 1.053e+02, threshold=5.035e+01, percent-clipped=1.0 2024-08-21 04:01:13,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5080650.0, ans=0.0 2024-08-21 04:01:29,539 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 4300, loss[loss=0.07051, beats_loss=0.01412, ecapa_loss=0.0001243, whisper_loss=0.05514, over 16469.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01049, ecapa_loss=0.0001392, whisper_loss=0.09055, over 3873617.82 frames. ], batch size: 67, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:01:33,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=5080750.0, ans=0.0 2024-08-21 04:01:52,458 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 28 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-21 04:02:04,176 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.74 vs. limit=15.0 2024-08-21 04:02:15,253 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 25 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-21 04:02:34,024 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 25 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-21 04:03:21,522 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 19 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-21 04:03:23,787 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 22 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-21 04:03:30,187 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 4350, loss[loss=0.09439, beats_loss=0.01249, ecapa_loss=0.0001484, whisper_loss=0.08041, over 18329.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01052, ecapa_loss=0.0001385, whisper_loss=0.08987, over 3879123.82 frames. ], batch size: 77, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:03:33,304 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.57 vs. limit=22.5 2024-08-21 04:03:42,474 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 20 from LS+wenet, 34 from Vox, 38 fro AS 2024-08-21 04:04:11,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5081450.0, ans=0.125 2024-08-21 04:04:20,630 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 19 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-21 04:04:37,005 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.693e+01 2.205e+01 2.430e+01 2.775e+01 4.634e+01, threshold=4.861e+01, percent-clipped=0.0 2024-08-21 04:04:41,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5081550.0, ans=0.125 2024-08-21 04:04:51,724 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 23 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-21 04:04:56,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5081550.0, ans=0.125 2024-08-21 04:05:11,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=5081650.0, ans=0.2 2024-08-21 04:05:15,295 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.74 vs. limit=6.0 2024-08-21 04:05:19,895 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 4400, loss[loss=0.08667, beats_loss=0.01024, ecapa_loss=0.0001968, whisper_loss=0.07446, over 12760.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01055, ecapa_loss=0.0001392, whisper_loss=0.089, over 3833554.67 frames. ], batch size: 54, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:05:22,379 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 23 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-21 04:05:25,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5081750.0, ans=0.1 2024-08-21 04:05:29,396 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.21 vs. limit=22.5 2024-08-21 04:05:31,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=5081750.0, ans=0.05 2024-08-21 04:05:47,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=5081850.0, ans=0.2 2024-08-21 04:05:47,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=5081850.0, ans=0.07 2024-08-21 04:05:49,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=5081850.0, ans=0.2 2024-08-21 04:05:59,364 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 22 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-21 04:06:07,780 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 16 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-21 04:06:09,961 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 14 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-21 04:06:15,356 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 27 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-21 04:06:30,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5081950.0, ans=0.1 2024-08-21 04:06:30,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5081950.0, ans=0.125 2024-08-21 04:06:51,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=5082050.0, ans=0.09899494936611666 2024-08-21 04:07:04,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=5082050.0, ans=0.0 2024-08-21 04:07:06,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=5082150.0, ans=0.035 2024-08-21 04:07:06,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=5082150.0, ans=0.2 2024-08-21 04:07:09,064 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.01 vs. limit=15.0 2024-08-21 04:07:31,852 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 4450, loss[loss=0.1171, beats_loss=0.007355, ecapa_loss=0.0001619, whisper_loss=0.1082, over 18562.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01048, ecapa_loss=0.0001391, whisper_loss=0.08917, over 3809503.40 frames. ], batch size: 74, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:07:42,217 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 16 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-21 04:07:51,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=5082250.0, ans=0.125 2024-08-21 04:08:04,212 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 26 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-21 04:08:31,008 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 21 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-21 04:08:35,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5082450.0, ans=0.125 2024-08-21 04:08:47,117 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 26 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-21 04:08:51,754 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.213e+01 2.422e+01 2.731e+01 3.413e+01, threshold=4.845e+01, percent-clipped=0.0 2024-08-21 04:08:52,037 INFO [train_multi_KD3.py:845] (3/4) A total of 97 cuts. 22 from LS+wenet, 32 from Vox, 43 fro AS 2024-08-21 04:08:53,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=5082550.0, ans=0.125 2024-08-21 04:09:02,347 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 18 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-21 04:09:42,529 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 4500, loss[loss=0.1217, beats_loss=0.007529, ecapa_loss=0.0001473, whisper_loss=0.1127, over 20596.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01048, ecapa_loss=0.0001393, whisper_loss=0.08851, over 3806930.95 frames. ], batch size: 81, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:09:45,445 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.78 vs. limit=22.5 2024-08-21 04:09:48,296 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-21 04:11:02,034 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 12 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-21 04:11:04,345 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.73 vs. limit=15.0 2024-08-21 04:11:05,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5083050.0, ans=0.1 2024-08-21 04:11:26,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=5083150.0, ans=0.0 2024-08-21 04:11:29,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=5083150.0, ans=0.0 2024-08-21 04:11:48,166 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 21 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-21 04:11:51,048 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-21 04:11:53,289 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 4550, loss[loss=0.09281, beats_loss=0.01074, ecapa_loss=0.0001242, whisper_loss=0.08082, over 23065.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01036, ecapa_loss=0.0001392, whisper_loss=0.08989, over 3805735.59 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:12:03,175 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-21 04:12:04,928 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.27 vs. limit=15.0 2024-08-21 04:12:12,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=5083250.0, ans=0.0 2024-08-21 04:12:17,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5083350.0, ans=0.125 2024-08-21 04:12:18,751 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 14 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-21 04:12:39,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=5083350.0, ans=0.2 2024-08-21 04:12:45,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=5083450.0, ans=0.125 2024-08-21 04:12:51,094 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 23 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-21 04:12:54,579 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 34 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-21 04:12:59,024 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.187e+05 2024-08-21 04:13:10,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=5083550.0, ans=0.0 2024-08-21 04:13:13,997 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.342e+01 2.629e+01 2.950e+01 5.025e+01, threshold=5.258e+01, percent-clipped=1.0 2024-08-21 04:13:43,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5083650.0, ans=0.125 2024-08-21 04:14:04,398 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 4600, loss[loss=0.113, beats_loss=0.008299, ecapa_loss=0.0001136, whisper_loss=0.1036, over 19240.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01034, ecapa_loss=0.0001391, whisper_loss=0.09006, over 3834093.20 frames. ], batch size: 70, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:14:09,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=5083750.0, ans=0.2 2024-08-21 04:14:12,673 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.09 vs. limit=6.0 2024-08-21 04:14:21,591 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 24 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-21 04:15:03,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5083950.0, ans=0.0 2024-08-21 04:15:29,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5084050.0, ans=0.125 2024-08-21 04:15:33,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=5084050.0, ans=0.0 2024-08-21 04:15:59,144 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.67 vs. limit=6.0 2024-08-21 04:16:07,705 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 4650, loss[loss=0.1067, beats_loss=0.01143, ecapa_loss=0.0001215, whisper_loss=0.09409, over 23163.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01034, ecapa_loss=0.0001382, whisper_loss=0.09056, over 3853633.55 frames. ], batch size: 88, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:16:35,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5084350.0, ans=0.125 2024-08-21 04:16:54,942 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 26 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-21 04:17:20,314 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-21 04:17:27,902 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.290e+01 2.552e+01 2.874e+01 1.481e+02, threshold=5.104e+01, percent-clipped=2.0 2024-08-21 04:17:28,165 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 26 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-21 04:17:30,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=5084550.0, ans=0.2 2024-08-21 04:17:51,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5084650.0, ans=0.125 2024-08-21 04:17:57,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=5084650.0, ans=0.125 2024-08-21 04:18:14,919 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 4700, loss[loss=0.103, beats_loss=0.01128, ecapa_loss=0.0001236, whisper_loss=0.09049, over 20771.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01034, ecapa_loss=0.0001377, whisper_loss=0.09032, over 3842106.68 frames. ], batch size: 81, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:19:12,052 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-21 04:19:30,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5084950.0, ans=0.125 2024-08-21 04:19:30,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5084950.0, ans=0.125 2024-08-21 04:19:34,562 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 38 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-21 04:19:57,327 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 33 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-21 04:20:11,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5085150.0, ans=0.125 2024-08-21 04:20:19,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5085150.0, ans=0.0 2024-08-21 04:20:25,053 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 4750, loss[loss=0.09359, beats_loss=0.01234, ecapa_loss=0.0001203, whisper_loss=0.08005, over 14753.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01035, ecapa_loss=0.0001384, whisper_loss=0.09049, over 3858651.09 frames. ], batch size: 59, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:20:40,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5085250.0, ans=0.125 2024-08-21 04:20:49,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5085350.0, ans=0.1 2024-08-21 04:20:51,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5085350.0, ans=0.1 2024-08-21 04:21:05,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5085350.0, ans=0.0 2024-08-21 04:21:33,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=5085450.0, ans=0.0 2024-08-21 04:21:44,310 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.249e+01 2.433e+01 2.723e+01 6.483e+01, threshold=4.865e+01, percent-clipped=1.0 2024-08-21 04:21:49,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5085550.0, ans=0.0 2024-08-21 04:22:01,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=5085550.0, ans=0.125 2024-08-21 04:22:08,770 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2024-08-21 04:22:23,342 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.97 vs. limit=22.5 2024-08-21 04:22:33,059 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 4800, loss[loss=0.08597, beats_loss=0.01188, ecapa_loss=0.0001235, whisper_loss=0.07285, over 18103.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01036, ecapa_loss=0.0001381, whisper_loss=0.09083, over 3857520.37 frames. ], batch size: 74, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:22:36,591 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 21 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-21 04:22:38,774 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 29 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-21 04:22:41,036 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 25 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-21 04:22:47,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5085750.0, ans=0.1 2024-08-21 04:23:12,263 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-21 04:23:26,752 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 25 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-21 04:24:05,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=5086050.0, ans=0.125 2024-08-21 04:24:14,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5086150.0, ans=0.0 2024-08-21 04:24:39,289 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 4850, loss[loss=0.06856, beats_loss=0.0144, ecapa_loss=0.0001404, whisper_loss=0.05275, over 18630.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01034, ecapa_loss=0.0001389, whisper_loss=0.09082, over 3869028.40 frames. ], batch size: 82, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:24:57,288 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 32 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-21 04:25:09,214 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-08-21 04:25:10,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=5086350.0, ans=0.2 2024-08-21 04:25:30,549 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.57 vs. limit=12.0 2024-08-21 04:25:48,603 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 30 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-21 04:25:52,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=5086550.0, ans=0.0 2024-08-21 04:25:56,102 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.255e+01 2.433e+01 2.647e+01 4.364e+01, threshold=4.866e+01, percent-clipped=0.0 2024-08-21 04:25:56,367 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 31 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-21 04:26:15,298 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 17 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-21 04:26:31,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5086650.0, ans=0.125 2024-08-21 04:26:38,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=5086650.0, ans=0.125 2024-08-21 04:26:42,417 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 4900, loss[loss=0.105, beats_loss=0.009521, ecapa_loss=0.0001175, whisper_loss=0.09428, over 14754.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01033, ecapa_loss=0.0001396, whisper_loss=0.09119, over 3892314.47 frames. ], batch size: 57, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:26:44,954 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.29 vs. limit=15.0 2024-08-21 04:27:00,199 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-21 04:27:34,230 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.647e+01 2024-08-21 04:27:35,114 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 26 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-21 04:27:43,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5086950.0, ans=0.125 2024-08-21 04:28:21,535 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 20 from LS+wenet, 31 from Vox, 43 fro AS 2024-08-21 04:28:49,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=5087150.0, ans=0.0 2024-08-21 04:28:49,460 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=17.82 vs. limit=15.0 2024-08-21 04:28:52,729 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 4950, loss[loss=0.104, beats_loss=0.01043, ecapa_loss=0.0001352, whisper_loss=0.09227, over 16962.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01031, ecapa_loss=0.000139, whisper_loss=0.09108, over 3890931.34 frames. ], batch size: 66, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:28:52,923 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 20 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-21 04:28:59,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.78 vs. limit=15.0 2024-08-21 04:29:04,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5087250.0, ans=0.125 2024-08-21 04:29:04,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5087250.0, ans=0.0 2024-08-21 04:29:24,046 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.42 vs. limit=22.5 2024-08-21 04:29:47,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5087450.0, ans=0.0 2024-08-21 04:30:14,714 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.734e+01 2.296e+01 2.472e+01 2.859e+01 4.220e+01, threshold=4.943e+01, percent-clipped=0.0 2024-08-21 04:30:24,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5087550.0, ans=0.125 2024-08-21 04:30:59,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=5087650.0, ans=10.0 2024-08-21 04:31:04,951 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 5000, loss[loss=0.1216, beats_loss=0.007695, ecapa_loss=0.0001331, whisper_loss=0.1126, over 18499.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0103, ecapa_loss=0.0001388, whisper_loss=0.09113, over 3859631.14 frames. ], batch size: 70, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:31:21,521 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 26 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-21 04:31:40,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5087850.0, ans=0.125 2024-08-21 04:31:50,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=5087850.0, ans=0.035 2024-08-21 04:31:53,945 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 26 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-21 04:33:08,082 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 5050, loss[loss=0.0875, beats_loss=0.01324, ecapa_loss=9.978e-05, whisper_loss=0.07326, over 16952.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01027, ecapa_loss=0.0001389, whisper_loss=0.09097, over 3841290.74 frames. ], batch size: 66, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:33:50,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5088350.0, ans=0.125 2024-08-21 04:34:07,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5088450.0, ans=0.125 2024-08-21 04:34:24,972 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.200e+01 2.422e+01 2.716e+01 3.329e+01, threshold=4.844e+01, percent-clipped=0.0 2024-08-21 04:35:01,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=5088650.0, ans=15.0 2024-08-21 04:35:05,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5088650.0, ans=0.125 2024-08-21 04:35:10,908 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 5100, loss[loss=0.09132, beats_loss=0.01051, ecapa_loss=0.0001503, whisper_loss=0.07931, over 17827.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01029, ecapa_loss=0.0001394, whisper_loss=0.09061, over 3831576.05 frames. ], batch size: 70, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:35:22,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=5088750.0, ans=0.0 2024-08-21 04:35:27,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5088750.0, ans=0.1 2024-08-21 04:35:27,724 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 04:35:35,063 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-21 04:36:47,553 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 29 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-21 04:36:49,921 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-21 04:36:56,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5089150.0, ans=0.1 2024-08-21 04:37:05,334 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 5150, loss[loss=0.1135, beats_loss=0.01061, ecapa_loss=0.0001285, whisper_loss=0.1016, over 23389.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01041, ecapa_loss=0.0001396, whisper_loss=0.09073, over 3864792.01 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:37:21,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=5089250.0, ans=0.125 2024-08-21 04:37:31,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=5089350.0, ans=0.125 2024-08-21 04:37:51,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=5089450.0, ans=0.07 2024-08-21 04:37:52,879 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 12 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-21 04:38:09,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=5089550.0, ans=0.2 2024-08-21 04:38:09,461 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.83 vs. limit=10.0 2024-08-21 04:38:12,362 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.286e+01 2.550e+01 3.060e+01 1.523e+02, threshold=5.101e+01, percent-clipped=5.0 2024-08-21 04:38:27,520 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-21 04:38:47,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=5089650.0, ans=0.125 2024-08-21 04:38:50,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=5089650.0, ans=0.2 2024-08-21 04:38:56,238 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 5200, loss[loss=0.1174, beats_loss=0.01091, ecapa_loss=0.0001434, whisper_loss=0.1051, over 23007.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0104, ecapa_loss=0.0001394, whisper_loss=0.0912, over 3892612.83 frames. ], batch size: 91, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:39:11,100 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.19 vs. limit=22.5 2024-08-21 04:39:20,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5089850.0, ans=0.0 2024-08-21 04:39:33,522 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.71 vs. limit=22.5 2024-08-21 04:39:38,793 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-21 04:39:46,710 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.93 vs. limit=15.0 2024-08-21 04:40:47,202 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 5250, loss[loss=0.08741, beats_loss=0.01071, ecapa_loss=0.0001357, whisper_loss=0.07534, over 20735.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0104, ecapa_loss=0.0001382, whisper_loss=0.09115, over 3878999.76 frames. ], batch size: 86, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 04:40:58,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5090250.0, ans=0.125 2024-08-21 04:41:16,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=5090350.0, ans=0.07 2024-08-21 04:41:20,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5090350.0, ans=0.125 2024-08-21 04:41:37,938 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 17 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-21 04:41:39,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5090450.0, ans=0.125 2024-08-21 04:41:43,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5090450.0, ans=0.125 2024-08-21 04:41:57,527 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.00 vs. limit=10.0 2024-08-21 04:41:58,051 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.269e+01 2.486e+01 2.907e+01 3.986e+01, threshold=4.971e+01, percent-clipped=0.0 2024-08-21 04:42:04,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=5090550.0, ans=0.2 2024-08-21 04:42:16,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5090550.0, ans=0.125 2024-08-21 04:42:25,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5090650.0, ans=0.0 2024-08-21 04:42:26,967 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 32 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-21 04:42:31,710 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 20 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-21 04:42:40,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5090650.0, ans=0.125 2024-08-21 04:42:42,844 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 5300, loss[loss=0.08331, beats_loss=0.01226, ecapa_loss=0.0001332, whisper_loss=0.06972, over 15485.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01038, ecapa_loss=0.0001381, whisper_loss=0.09095, over 3876739.44 frames. ], batch size: 61, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 04:42:46,713 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-08-21 04:43:12,560 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 22 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-21 04:43:22,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=5090850.0, ans=0.125 2024-08-21 04:43:47,808 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 18 from LS+wenet, 25 from Vox, 17 fro AS 2024-08-21 04:43:58,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5091050.0, ans=0.125 2024-08-21 04:44:34,692 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 20 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-21 04:44:40,039 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-21 04:44:42,693 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 5350, loss[loss=0.104, beats_loss=0.01048, ecapa_loss=0.0001354, whisper_loss=0.09215, over 17553.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01038, ecapa_loss=0.0001378, whisper_loss=0.09008, over 3835841.97 frames. ], batch size: 69, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 04:44:59,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=5091250.0, ans=0.95 2024-08-21 04:45:06,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=5091350.0, ans=0.2 2024-08-21 04:45:50,446 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.22 vs. limit=15.0 2024-08-21 04:45:57,412 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.646e-03 2024-08-21 04:46:00,047 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.175e+01 2.396e+01 2.648e+01 3.168e+01, threshold=4.792e+01, percent-clipped=0.0 2024-08-21 04:46:00,235 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 22 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-21 04:46:13,190 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-21 04:46:17,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=5091550.0, ans=0.09899494936611666 2024-08-21 04:46:47,149 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 25 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-21 04:46:47,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5091750.0, ans=0.125 2024-08-21 04:46:48,413 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 5400, loss[loss=0.1056, beats_loss=0.009377, ecapa_loss=0.0001581, whisper_loss=0.09462, over 17126.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01034, ecapa_loss=0.0001379, whisper_loss=0.0899, over 3828119.72 frames. ], batch size: 70, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 04:47:26,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=5091850.0, ans=0.2 2024-08-21 04:47:27,831 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 24 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-21 04:47:44,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=5091950.0, ans=0.07 2024-08-21 04:48:13,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5092050.0, ans=0.0 2024-08-21 04:48:43,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=5092150.0, ans=0.2 2024-08-21 04:48:56,702 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.34 vs. limit=15.0 2024-08-21 04:48:57,116 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 5450, loss[loss=0.129, beats_loss=0.00724, ecapa_loss=0.0001819, whisper_loss=0.1199, over 17043.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01027, ecapa_loss=0.0001383, whisper_loss=0.08984, over 3799019.35 frames. ], batch size: 73, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 04:49:09,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5092250.0, ans=0.125 2024-08-21 04:49:24,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=5092350.0, ans=0.125 2024-08-21 04:49:26,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=5092350.0, ans=22.5 2024-08-21 04:49:50,182 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 21 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-21 04:50:02,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=5092450.0, ans=0.125 2024-08-21 04:50:15,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=5092550.0, ans=0.0 2024-08-21 04:50:18,958 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.233e+01 2.525e+01 2.938e+01 2.405e+02, threshold=5.050e+01, percent-clipped=4.0 2024-08-21 04:50:49,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=5092650.0, ans=0.0 2024-08-21 04:51:09,333 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 5500, loss[loss=0.09867, beats_loss=0.008806, ecapa_loss=0.000158, whisper_loss=0.08829, over 17892.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01029, ecapa_loss=0.000138, whisper_loss=0.08979, over 3796514.39 frames. ], batch size: 70, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 04:51:10,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=5092750.0, ans=0.0 2024-08-21 04:51:13,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.69 vs. limit=6.0 2024-08-21 04:51:52,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=5092850.0, ans=0.025 2024-08-21 04:52:26,924 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 19 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-21 04:52:32,362 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.19 vs. limit=15.0 2024-08-21 04:52:52,897 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 18 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-21 04:53:20,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=5093250.0, ans=0.125 2024-08-21 04:53:20,893 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 5550, loss[loss=0.1094, beats_loss=0.01043, ecapa_loss=0.000127, whisper_loss=0.09771, over 20635.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01029, ecapa_loss=0.000139, whisper_loss=0.0898, over 3801688.59 frames. ], batch size: 81, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:53:43,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5093250.0, ans=0.125 2024-08-21 04:53:48,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=5093350.0, ans=0.125 2024-08-21 04:54:04,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5093350.0, ans=0.0 2024-08-21 04:54:21,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=5093450.0, ans=0.95 2024-08-21 04:54:35,262 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 24 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-21 04:54:39,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=5093450.0, ans=0.0 2024-08-21 04:54:48,032 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.248e+01 2.482e+01 2.824e+01 3.933e+01, threshold=4.964e+01, percent-clipped=0.0 2024-08-21 04:54:53,525 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 31 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-21 04:54:59,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=5093550.0, ans=0.0 2024-08-21 04:55:24,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5093650.0, ans=0.1 2024-08-21 04:55:27,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=5093650.0, ans=0.0 2024-08-21 04:55:33,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=5093750.0, ans=0.125 2024-08-21 04:55:33,851 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 5600, loss[loss=0.06963, beats_loss=0.01211, ecapa_loss=0.0001348, whisper_loss=0.05617, over 13528.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01032, ecapa_loss=0.0001388, whisper_loss=0.08972, over 3787862.40 frames. ], batch size: 53, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:55:42,313 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 37 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-21 04:55:49,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=5093750.0, ans=0.02 2024-08-21 04:56:16,130 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-21 04:56:24,084 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 18 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-21 04:56:38,073 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.73 vs. limit=22.5 2024-08-21 04:57:05,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5094050.0, ans=0.125 2024-08-21 04:57:19,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5094150.0, ans=0.125 2024-08-21 04:57:33,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5094150.0, ans=0.125 2024-08-21 04:57:35,948 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 5650, loss[loss=0.09292, beats_loss=0.01209, ecapa_loss=0.0001247, whisper_loss=0.07958, over 21093.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01039, ecapa_loss=0.0001377, whisper_loss=0.08992, over 3791718.69 frames. ], batch size: 85, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:57:37,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=5094250.0, ans=0.2 2024-08-21 04:57:39,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5094250.0, ans=0.1 2024-08-21 04:58:21,714 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2024-08-21 04:58:21,861 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.09 vs. limit=10.0 2024-08-21 04:58:49,183 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.209e+01 2.456e+01 2.705e+01 6.075e+01, threshold=4.911e+01, percent-clipped=1.0 2024-08-21 04:59:06,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5094550.0, ans=0.0 2024-08-21 04:59:24,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5094650.0, ans=0.0 2024-08-21 04:59:34,814 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 5700, loss[loss=0.0878, beats_loss=0.01292, ecapa_loss=0.0001313, whisper_loss=0.07357, over 17759.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01037, ecapa_loss=0.0001388, whisper_loss=0.08993, over 3807453.37 frames. ], batch size: 73, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:59:47,417 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.97 vs. limit=15.0 2024-08-21 04:59:57,341 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 17 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-21 04:59:59,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5094850.0, ans=0.125 2024-08-21 05:00:02,650 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 25 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-21 05:00:35,428 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-21 05:00:53,133 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.95 vs. limit=15.0 2024-08-21 05:01:11,704 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-21 05:01:13,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5095150.0, ans=0.125 2024-08-21 05:01:16,439 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 25 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-21 05:01:25,339 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 5750, loss[loss=0.08918, beats_loss=0.01066, ecapa_loss=0.0001593, whisper_loss=0.07692, over 21597.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01042, ecapa_loss=0.0001387, whisper_loss=0.08922, over 3833808.09 frames. ], batch size: 91, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:01:29,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5095250.0, ans=0.0 2024-08-21 05:01:36,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5095250.0, ans=0.1 2024-08-21 05:01:42,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=5095250.0, ans=0.0 2024-08-21 05:02:01,703 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 18 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-21 05:02:07,481 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.125e+01 2024-08-21 05:02:12,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5095450.0, ans=0.125 2024-08-21 05:02:18,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5095450.0, ans=0.1 2024-08-21 05:02:19,395 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 23 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-21 05:02:38,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=5095550.0, ans=0.2 2024-08-21 05:02:38,935 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.297e+01 2.497e+01 2.736e+01 4.299e+01, threshold=4.994e+01, percent-clipped=0.0 2024-08-21 05:03:03,497 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.21 vs. limit=15.0 2024-08-21 05:03:19,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=5095750.0, ans=0.125 2024-08-21 05:03:20,582 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 5800, loss[loss=0.1034, beats_loss=0.008942, ecapa_loss=0.00015, whisper_loss=0.093, over 22211.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01045, ecapa_loss=0.000139, whisper_loss=0.08896, over 3842819.12 frames. ], batch size: 87, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:03:24,830 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.00 vs. limit=15.0 2024-08-21 05:03:26,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5095750.0, ans=0.125 2024-08-21 05:03:40,900 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-21 05:03:45,686 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-21 05:03:47,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5095850.0, ans=0.0 2024-08-21 05:03:50,425 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 18 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-21 05:03:54,911 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 20 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-21 05:04:03,129 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.92 vs. limit=15.0 2024-08-21 05:04:12,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=5095950.0, ans=0.0 2024-08-21 05:04:59,203 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 25 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-21 05:05:11,383 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 5850, loss[loss=0.07662, beats_loss=0.01355, ecapa_loss=0.0001177, whisper_loss=0.06189, over 21579.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01044, ecapa_loss=0.000139, whisper_loss=0.08861, over 3869632.97 frames. ], batch size: 92, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:05:21,795 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 17 from LS+wenet, 34 from Vox, 33 fro AS 2024-08-21 05:05:36,101 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 29 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-21 05:05:39,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5096350.0, ans=0.0 2024-08-21 05:05:46,468 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-21 05:05:58,925 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-08-21 05:05:59,442 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 29 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-21 05:06:00,587 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.57 vs. limit=15.0 2024-08-21 05:06:05,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=5096450.0, ans=0.0 2024-08-21 05:06:14,565 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.298e+01 2.570e+01 2.824e+01 3.912e+01, threshold=5.140e+01, percent-clipped=0.0 2024-08-21 05:06:22,714 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 21 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-21 05:06:24,579 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 24 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-21 05:06:36,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5096650.0, ans=0.1 2024-08-21 05:06:50,867 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 5900, loss[loss=0.08859, beats_loss=0.01458, ecapa_loss=0.0001097, whisper_loss=0.07291, over 23320.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01035, ecapa_loss=0.0001397, whisper_loss=0.08914, over 3863595.83 frames. ], batch size: 94, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:06:56,592 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.76 vs. limit=15.0 2024-08-21 05:06:59,463 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.43 vs. limit=15.0 2024-08-21 05:07:15,245 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 05:07:17,385 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.498e-01 2024-08-21 05:07:22,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5096850.0, ans=0.1 2024-08-21 05:07:24,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=5096850.0, ans=0.2 2024-08-21 05:07:26,415 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2024-08-21 05:07:32,959 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 25 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-21 05:07:37,391 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 22 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-21 05:07:41,649 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 17 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-21 05:08:02,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=5097050.0, ans=0.0 2024-08-21 05:08:24,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=5097150.0, ans=0.0 2024-08-21 05:08:35,848 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 5950, loss[loss=0.08273, beats_loss=0.0132, ecapa_loss=0.0001179, whisper_loss=0.06836, over 16426.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01043, ecapa_loss=0.0001393, whisper_loss=0.08849, over 3832517.46 frames. ], batch size: 64, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:08:45,507 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.71 vs. limit=15.0 2024-08-21 05:09:09,309 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.24 vs. limit=10.0 2024-08-21 05:09:24,468 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 21 from LS+wenet, 27 from Vox, 46 fro AS 2024-08-21 05:09:31,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5097450.0, ans=0.125 2024-08-21 05:09:43,023 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.327e+01 2.629e+01 2.901e+01 4.645e+01, threshold=5.259e+01, percent-clipped=0.0 2024-08-21 05:09:45,078 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 23 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-21 05:09:49,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=5097550.0, ans=15.0 2024-08-21 05:10:02,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5097650.0, ans=0.0 2024-08-21 05:10:16,081 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 6000, loss[loss=0.102, beats_loss=0.01084, ecapa_loss=0.0001518, whisper_loss=0.08965, over 19132.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01038, ecapa_loss=0.0001387, whisper_loss=0.0888, over 3838421.06 frames. ], batch size: 78, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:10:16,082 INFO [train_multi_KD3.py:1140] (3/4) Computing validation loss 2024-08-21 05:10:54,003 INFO [train_multi_KD3.py:1150] (3/4) Epoch 35, validation on ASR_libri: loss=0.2537, beats_loss=0, ecapa_loss=0.0005022, whisper_loss=0.2487, over 931116.00 frames. 2024-08-21 05:11:19,511 INFO [train_multi_KD3.py:1150] (3/4) Epoch 35, validation on SV_voxceleb1: loss=0.003907, beats_loss=0, ecapa_loss=0.0003907, whisper_loss=0, over 944235.00 frames. 2024-08-21 05:12:17,901 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.8481, 1.6575, 1.8773, 1.2597, 1.4846, 2.0163, 2.4972, 1.5885], device='cuda:3') 2024-08-21 05:13:03,025 INFO [train_multi_KD3.py:1150] (3/4) Epoch 35, validation on AT_audioset: loss=0.023, beats_loss=0.023, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-21 05:13:03,030 INFO [train_multi_KD3.py:1156] (3/4) Maximum memory allocated so far is 32606MB 2024-08-21 05:13:10,821 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2024-08-21 05:13:13,666 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 28 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-21 05:13:17,801 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.70 vs. limit=15.0 2024-08-21 05:13:25,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=5097850.0, ans=15.0 2024-08-21 05:13:27,138 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 30 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-21 05:13:34,035 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=15.0 2024-08-21 05:13:34,942 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 12 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-21 05:13:36,955 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 22 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-21 05:14:02,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5098050.0, ans=0.125 2024-08-21 05:14:08,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5098050.0, ans=0.0 2024-08-21 05:14:08,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5098050.0, ans=0.1 2024-08-21 05:14:17,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5098150.0, ans=0.1 2024-08-21 05:14:37,230 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 6050, loss[loss=0.1079, beats_loss=0.01128, ecapa_loss=0.0001543, whisper_loss=0.09504, over 21966.00 frames. ], tot_loss[loss=0.0999, beats_loss=0.0105, ecapa_loss=0.0001386, whisper_loss=0.08801, over 3825865.56 frames. ], batch size: 91, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:14:43,274 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 33 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-21 05:14:50,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5098250.0, ans=0.125 2024-08-21 05:15:03,303 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 15 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-21 05:15:36,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=5098550.0, ans=0.025 2024-08-21 05:15:39,461 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.180e+01 2.501e+01 2.659e+01 4.695e+01, threshold=5.002e+01, percent-clipped=0.0 2024-08-21 05:15:42,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=5098550.0, ans=0.0 2024-08-21 05:15:45,204 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.670e-03 2024-08-21 05:15:48,490 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 25 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-21 05:16:11,349 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-21 05:16:12,977 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 6100, loss[loss=0.09704, beats_loss=0.0109, ecapa_loss=0.0001354, whisper_loss=0.08478, over 22293.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01058, ecapa_loss=0.0001374, whisper_loss=0.08845, over 3857952.55 frames. ], batch size: 91, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:16:22,194 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.56 vs. limit=15.0 2024-08-21 05:17:02,170 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 20 from LS+wenet, 19 from Vox, 51 fro AS 2024-08-21 05:17:10,262 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2024-08-21 05:17:18,367 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.51 vs. limit=22.5 2024-08-21 05:17:45,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5099150.0, ans=0.1 2024-08-21 05:17:52,001 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 6150, loss[loss=0.1108, beats_loss=0.01094, ecapa_loss=0.0001347, whisper_loss=0.09854, over 22757.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01073, ecapa_loss=0.0001363, whisper_loss=0.08814, over 3842769.76 frames. ], batch size: 91, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:18:10,378 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 27 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-21 05:18:14,391 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 28 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-21 05:18:53,364 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.269e+01 2.480e+01 2.871e+01 4.819e+02, threshold=4.960e+01, percent-clipped=2.0 2024-08-21 05:19:01,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5099550.0, ans=0.125 2024-08-21 05:19:09,971 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-21 05:19:26,653 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 6200, loss[loss=0.09464, beats_loss=0.01016, ecapa_loss=0.0001475, whisper_loss=0.08301, over 22186.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01071, ecapa_loss=0.0001374, whisper_loss=0.08869, over 3850144.01 frames. ], batch size: 89, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:19:39,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5099750.0, ans=0.125 2024-08-21 05:19:42,487 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 21 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-21 05:19:43,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=5099750.0, ans=0.125 2024-08-21 05:19:51,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5099850.0, ans=0.1 2024-08-21 05:19:56,949 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.81 vs. limit=22.5 2024-08-21 05:19:57,510 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 24 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-21 05:20:03,580 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 20 from LS+wenet, 8 from Vox, 23 fro AS 2024-08-21 05:20:10,293 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-21 05:20:12,443 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.03 vs. limit=15.0 2024-08-21 05:20:24,511 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 18 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-21 05:20:42,234 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2024-08-21 05:21:03,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=5100150.0, ans=0.125 2024-08-21 05:21:08,198 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 6250, loss[loss=0.1053, beats_loss=0.01061, ecapa_loss=0.0001337, whisper_loss=0.09333, over 22729.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.0107, ecapa_loss=0.0001377, whisper_loss=0.08834, over 3807030.75 frames. ], batch size: 88, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:21:09,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5100250.0, ans=0.125 2024-08-21 05:21:13,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5100250.0, ans=0.1 2024-08-21 05:21:15,186 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.70 vs. limit=10.0 2024-08-21 05:21:25,267 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 20 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-21 05:21:48,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=5100450.0, ans=0.0 2024-08-21 05:21:54,299 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 05:21:57,734 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.01 vs. limit=10.0 2024-08-21 05:22:12,585 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.226e+01 2.520e+01 2.834e+01 9.847e+01, threshold=5.039e+01, percent-clipped=0.0 2024-08-21 05:22:15,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=5100550.0, ans=0.125 2024-08-21 05:22:19,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=5100550.0, ans=0.0 2024-08-21 05:22:31,234 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 32 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-21 05:22:49,578 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 6300, loss[loss=0.0879, beats_loss=0.009353, ecapa_loss=0.0001425, whisper_loss=0.07712, over 14584.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01065, ecapa_loss=0.0001382, whisper_loss=0.0885, over 3790819.31 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:23:03,712 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.56 vs. limit=12.0 2024-08-21 05:23:10,286 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 26 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-21 05:23:26,536 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.35 vs. limit=15.0 2024-08-21 05:23:34,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=5100950.0, ans=0.2 2024-08-21 05:23:43,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5100950.0, ans=0.125 2024-08-21 05:24:01,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=5101050.0, ans=0.2 2024-08-21 05:24:02,076 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 17 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-21 05:24:02,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=5101050.0, ans=0.0 2024-08-21 05:24:08,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=5101150.0, ans=0.125 2024-08-21 05:24:26,801 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 6350, loss[loss=0.08762, beats_loss=0.0119, ecapa_loss=0.0001217, whisper_loss=0.07451, over 14757.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.0106, ecapa_loss=0.000139, whisper_loss=0.08845, over 3818939.12 frames. ], batch size: 59, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:24:36,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=5101250.0, ans=0.125 2024-08-21 05:24:42,736 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=15.0 2024-08-21 05:25:11,251 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-21 05:25:21,221 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.84 vs. limit=15.0 2024-08-21 05:25:26,329 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 18 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-21 05:25:29,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5101550.0, ans=0.1 2024-08-21 05:25:30,131 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.267e+01 2.496e+01 2.800e+01 3.336e+01, threshold=4.993e+01, percent-clipped=1.0 2024-08-21 05:25:37,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=5101550.0, ans=0.07 2024-08-21 05:26:00,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=5101650.0, ans=15.0 2024-08-21 05:26:04,601 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 6400, loss[loss=0.05487, beats_loss=0.01497, ecapa_loss=0.000125, whisper_loss=0.03865, over 15024.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01056, ecapa_loss=0.0001389, whisper_loss=0.08828, over 3810473.19 frames. ], batch size: 61, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:26:07,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=5101750.0, ans=0.125 2024-08-21 05:26:13,259 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0562300942838192, model_norm_threshold=49.92792510986328 2024-08-21 05:26:13,431 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.1.norm.log_scale with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.997e+04, grad_sumsq=6.997e+04, orig_rms_sq=1.000e+00 2024-08-21 05:26:28,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=5101850.0, ans=0.025 2024-08-21 05:26:41,722 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 25 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-21 05:27:08,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=5102050.0, ans=0.0 2024-08-21 05:27:37,082 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 6450, loss[loss=0.09426, beats_loss=0.0125, ecapa_loss=0.0001263, whisper_loss=0.0805, over 16023.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.0105, ecapa_loss=0.00014, whisper_loss=0.08884, over 3769637.63 frames. ], batch size: 65, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:27:54,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=5102350.0, ans=0.125 2024-08-21 05:27:58,017 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 05:27:58,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=5102350.0, ans=0.125 2024-08-21 05:28:08,910 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 18 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-21 05:28:17,100 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.43 vs. limit=10.0 2024-08-21 05:28:34,915 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.209e+01 2.498e+01 2.911e+01 8.879e+02, threshold=4.995e+01, percent-clipped=1.0 2024-08-21 05:28:35,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=5102550.0, ans=0.0 2024-08-21 05:28:42,595 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 24 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-21 05:28:47,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5102550.0, ans=0.0 2024-08-21 05:29:06,502 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-21 05:29:07,903 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 6500, loss[loss=0.09941, beats_loss=0.01125, ecapa_loss=0.000144, whisper_loss=0.08673, over 22069.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01054, ecapa_loss=0.0001389, whisper_loss=0.08888, over 3781431.89 frames. ], batch size: 89, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:29:13,908 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 18 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-21 05:29:32,700 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-21 05:29:50,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5102950.0, ans=0.0 2024-08-21 05:29:58,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5102950.0, ans=0.1 2024-08-21 05:30:31,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5103150.0, ans=0.0 2024-08-21 05:30:31,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=5103150.0, ans=0.125 2024-08-21 05:30:37,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=5103150.0, ans=0.025 2024-08-21 05:30:39,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=5103150.0, ans=0.2 2024-08-21 05:30:50,205 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 6550, loss[loss=0.1154, beats_loss=0.01031, ecapa_loss=0.0001345, whisper_loss=0.1037, over 22436.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01048, ecapa_loss=0.00014, whisper_loss=0.08906, over 3782050.60 frames. ], batch size: 88, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:31:02,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=5103250.0, ans=0.125 2024-08-21 05:31:02,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=5103250.0, ans=0.2 2024-08-21 05:31:19,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5103350.0, ans=0.125 2024-08-21 05:31:27,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=5103350.0, ans=0.05 2024-08-21 05:31:34,548 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 24 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-21 05:31:36,411 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-21 05:31:43,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=5103450.0, ans=0.0 2024-08-21 05:31:59,040 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.280e+01 2.467e+01 2.785e+01 3.437e+01, threshold=4.934e+01, percent-clipped=0.0 2024-08-21 05:32:02,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5103550.0, ans=0.125 2024-08-21 05:32:05,049 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.83 vs. limit=12.0 2024-08-21 05:32:21,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5103650.0, ans=0.125 2024-08-21 05:32:23,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5103650.0, ans=0.0 2024-08-21 05:32:31,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5103650.0, ans=0.125 2024-08-21 05:32:32,490 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-08-21 05:32:33,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5103750.0, ans=0.125 2024-08-21 05:32:34,103 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 6600, loss[loss=0.1031, beats_loss=0.01052, ecapa_loss=0.0001285, whisper_loss=0.09131, over 17155.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01048, ecapa_loss=0.0001399, whisper_loss=0.08975, over 3834731.30 frames. ], batch size: 69, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:32:53,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=5103850.0, ans=0.125 2024-08-21 05:33:35,317 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 13 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-21 05:34:05,744 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 18 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-21 05:34:13,042 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 6650, loss[loss=0.106, beats_loss=0.009009, ecapa_loss=0.0001581, whisper_loss=0.09539, over 22153.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0104, ecapa_loss=0.0001405, whisper_loss=0.09002, over 3813812.95 frames. ], batch size: 89, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:34:14,472 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.27 vs. limit=22.5 2024-08-21 05:34:30,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=5104250.0, ans=0.04949747468305833 2024-08-21 05:34:51,795 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.194e+00 2024-08-21 05:34:54,308 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-21 05:34:55,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=5104450.0, ans=0.125 2024-08-21 05:35:10,622 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-21 05:35:15,647 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.320e+01 2.541e+01 2.816e+01 4.422e+01, threshold=5.082e+01, percent-clipped=0.0 2024-08-21 05:35:25,069 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.53 vs. limit=6.0 2024-08-21 05:35:34,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=5104650.0, ans=0.0 2024-08-21 05:35:51,148 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 6700, loss[loss=0.09939, beats_loss=0.01205, ecapa_loss=0.0001275, whisper_loss=0.08606, over 21388.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01043, ecapa_loss=0.0001407, whisper_loss=0.08998, over 3823508.52 frames. ], batch size: 86, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:36:29,302 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 25 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-21 05:36:33,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5104950.0, ans=0.125 2024-08-21 05:36:37,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=5104950.0, ans=0.09899494936611666 2024-08-21 05:36:42,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=5104950.0, ans=0.0 2024-08-21 05:37:27,726 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 6750, loss[loss=0.1024, beats_loss=0.01073, ecapa_loss=0.0001356, whisper_loss=0.09033, over 21966.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01034, ecapa_loss=0.0001401, whisper_loss=0.09095, over 3869871.78 frames. ], batch size: 89, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:37:38,543 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.57 vs. limit=22.5 2024-08-21 05:37:49,929 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 25 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-21 05:37:59,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=5105350.0, ans=0.1 2024-08-21 05:38:01,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=5105350.0, ans=0.5 2024-08-21 05:38:08,068 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 31 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-21 05:38:25,764 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.375e+01 2.599e+01 2.848e+01 3.757e+01, threshold=5.199e+01, percent-clipped=0.0 2024-08-21 05:38:25,999 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 30 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-21 05:38:59,461 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 6800, loss[loss=0.1114, beats_loss=0.01015, ecapa_loss=0.0001722, whisper_loss=0.09957, over 22534.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01034, ecapa_loss=0.0001406, whisper_loss=0.09122, over 3880171.59 frames. ], batch size: 93, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:39:03,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5105750.0, ans=0.1 2024-08-21 05:39:12,836 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.03 vs. limit=10.0 2024-08-21 05:39:22,658 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 24 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-21 05:39:24,415 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 19 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-21 05:39:32,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=5105850.0, ans=0.0 2024-08-21 05:39:49,821 INFO [train_multi_KD3.py:845] (3/4) A total of 96 cuts. 28 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-21 05:39:53,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=5105950.0, ans=0.0 2024-08-21 05:40:00,213 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 27 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-21 05:40:09,347 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 24 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-21 05:40:11,025 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 23 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-21 05:40:31,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=5106150.0, ans=0.0 2024-08-21 05:40:33,861 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 6850, loss[loss=0.0936, beats_loss=0.01385, ecapa_loss=0.0001024, whisper_loss=0.07872, over 22583.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01042, ecapa_loss=0.0001404, whisper_loss=0.0906, over 3888934.61 frames. ], batch size: 91, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:40:47,933 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 13 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-21 05:41:02,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5106350.0, ans=0.125 2024-08-21 05:41:02,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=5106350.0, ans=0.04949747468305833 2024-08-21 05:41:09,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=5106450.0, ans=0.2 2024-08-21 05:41:28,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5106550.0, ans=0.125 2024-08-21 05:41:32,730 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.327e+01 2.654e+01 2.944e+01 2.744e+02, threshold=5.308e+01, percent-clipped=2.0 2024-08-21 05:41:47,282 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 27 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-21 05:41:54,970 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 23 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-21 05:42:05,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5106750.0, ans=0.0 2024-08-21 05:42:05,957 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 6900, loss[loss=0.09555, beats_loss=0.009525, ecapa_loss=0.0001394, whisper_loss=0.08463, over 20825.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01041, ecapa_loss=0.0001399, whisper_loss=0.09064, over 3870710.82 frames. ], batch size: 83, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:42:06,238 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 19 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-21 05:42:21,957 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 32 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-21 05:42:26,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=5106850.0, ans=0.035 2024-08-21 05:42:36,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=5106850.0, ans=0.0 2024-08-21 05:42:44,591 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.14 vs. limit=8.0 2024-08-21 05:42:58,564 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.72 vs. limit=22.5 2024-08-21 05:43:07,821 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=15.0 2024-08-21 05:43:11,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=5107050.0, ans=0.2 2024-08-21 05:43:16,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5107150.0, ans=0.125 2024-08-21 05:43:29,216 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.06 vs. limit=10.0 2024-08-21 05:43:32,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5107150.0, ans=0.125 2024-08-21 05:43:35,497 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 6950, loss[loss=0.1118, beats_loss=0.00925, ecapa_loss=0.0001431, whisper_loss=0.1012, over 21912.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01043, ecapa_loss=0.0001409, whisper_loss=0.08986, over 3885289.14 frames. ], batch size: 89, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:44:05,337 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.712e+00 2024-08-21 05:44:08,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5107350.0, ans=0.125 2024-08-21 05:44:14,955 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-21 05:44:28,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=5107550.0, ans=0.125 2024-08-21 05:44:32,778 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.277e+01 2.529e+01 2.921e+01 4.469e+01, threshold=5.057e+01, percent-clipped=0.0 2024-08-21 05:44:48,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=5107650.0, ans=0.2 2024-08-21 05:44:58,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5107650.0, ans=0.0 2024-08-21 05:45:06,285 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 7000, loss[loss=0.1243, beats_loss=0.008325, ecapa_loss=0.0001196, whisper_loss=0.1148, over 16451.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01041, ecapa_loss=0.0001407, whisper_loss=0.08973, over 3878997.42 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:45:17,795 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 21 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-21 05:45:18,558 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=7.568e-03 2024-08-21 05:45:18,860 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.87 vs. limit=15.0 2024-08-21 05:45:21,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=5107750.0, ans=0.125 2024-08-21 05:45:24,573 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 28 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-21 05:45:29,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=5107850.0, ans=0.0 2024-08-21 05:45:37,824 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 34 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-21 05:45:50,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5107950.0, ans=0.125 2024-08-21 05:45:50,890 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-08-21 05:45:51,324 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 18 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-21 05:45:55,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5107950.0, ans=0.1 2024-08-21 05:45:57,816 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.47 vs. limit=15.0 2024-08-21 05:46:06,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5108050.0, ans=0.1 2024-08-21 05:46:38,982 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 7050, loss[loss=0.08448, beats_loss=0.01063, ecapa_loss=0.0001025, whisper_loss=0.07282, over 16144.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01052, ecapa_loss=0.0001395, whisper_loss=0.08921, over 3873226.01 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:46:53,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5108250.0, ans=0.1 2024-08-21 05:46:56,214 WARNING [optim.py:496] (3/4) Scaling gradients by 0.049437928944826126, model_norm_threshold=50.57056427001953 2024-08-21 05:46:56,382 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.19, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.007e+05, grad_sumsq=1.862e+07, orig_rms_sq=1.077e-02 2024-08-21 05:46:57,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=5108350.0, ans=0.0 2024-08-21 05:47:07,159 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 16 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-21 05:47:07,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5108350.0, ans=0.0 2024-08-21 05:47:09,969 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.31 vs. limit=22.5 2024-08-21 05:47:13,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=5108450.0, ans=0.04949747468305833 2024-08-21 05:47:30,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5108550.0, ans=0.0 2024-08-21 05:47:34,953 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.310e+01 2.518e+01 2.782e+01 1.023e+03, threshold=5.036e+01, percent-clipped=2.0 2024-08-21 05:47:53,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5108650.0, ans=0.125 2024-08-21 05:47:58,433 WARNING [optim.py:496] (3/4) Scaling gradients by 0.07700152695178986, model_norm_threshold=50.36127471923828 2024-08-21 05:47:58,602 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.694e+04, grad_sumsq=7.694e+04, orig_rms_sq=1.000e+00 2024-08-21 05:47:59,973 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.84 vs. limit=15.0 2024-08-21 05:48:05,796 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 28 from LS+wenet, 34 from Vox, 30 fro AS 2024-08-21 05:48:07,497 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 7100, loss[loss=0.1051, beats_loss=0.009199, ecapa_loss=0.0002151, whisper_loss=0.0937, over 21698.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01053, ecapa_loss=0.0001406, whisper_loss=0.08928, over 3837899.05 frames. ], batch size: 92, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:48:27,618 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 30 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-21 05:48:48,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5108950.0, ans=0.0 2024-08-21 05:49:02,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5109050.0, ans=0.1 2024-08-21 05:49:09,500 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.00 vs. limit=15.0 2024-08-21 05:49:12,230 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 25 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-21 05:49:15,214 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.72 vs. limit=15.0 2024-08-21 05:49:15,856 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 23 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-21 05:49:23,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=5109150.0, ans=0.0 2024-08-21 05:49:36,241 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 7150, loss[loss=0.09807, beats_loss=0.01081, ecapa_loss=0.0001256, whisper_loss=0.08601, over 17145.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01053, ecapa_loss=0.000141, whisper_loss=0.08921, over 3810033.29 frames. ], batch size: 65, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:49:44,506 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2024-08-21 05:49:53,980 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 13 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-21 05:50:10,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5109450.0, ans=0.1 2024-08-21 05:50:32,479 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.665e+01 2.268e+01 2.455e+01 2.616e+01 6.540e+02, threshold=4.909e+01, percent-clipped=2.0 2024-08-21 05:50:45,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=5109550.0, ans=15.0 2024-08-21 05:50:54,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5109650.0, ans=0.125 2024-08-21 05:51:01,076 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 25 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-21 05:51:05,959 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 7200, loss[loss=0.09239, beats_loss=0.009158, ecapa_loss=0.0001512, whisper_loss=0.08172, over 15116.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01044, ecapa_loss=0.0001401, whisper_loss=0.08933, over 3820105.45 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:51:54,462 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-21 05:52:05,538 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 30 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-21 05:52:20,039 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 15 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-21 05:52:26,762 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-21 05:52:36,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5110250.0, ans=0.1 2024-08-21 05:52:37,644 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 7250, loss[loss=0.1075, beats_loss=0.0112, ecapa_loss=9.509e-05, whisper_loss=0.09531, over 18400.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01042, ecapa_loss=0.0001393, whisper_loss=0.08927, over 3797462.28 frames. ], batch size: 68, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:52:50,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=5110250.0, ans=0.09899494936611666 2024-08-21 05:52:52,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=5110250.0, ans=0.125 2024-08-21 05:53:13,158 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=15.0 2024-08-21 05:53:23,095 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 17 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-21 05:53:26,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5110450.0, ans=0.125 2024-08-21 05:53:35,125 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.241e+01 2.447e+01 2.768e+01 8.311e+01, threshold=4.894e+01, percent-clipped=2.0 2024-08-21 05:53:36,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5110550.0, ans=0.1 2024-08-21 05:53:40,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=5110550.0, ans=0.0 2024-08-21 05:53:58,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5110650.0, ans=0.1 2024-08-21 05:54:07,922 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 7300, loss[loss=0.1117, beats_loss=0.01013, ecapa_loss=0.0001344, whisper_loss=0.1002, over 15962.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01042, ecapa_loss=0.000139, whisper_loss=0.08914, over 3786072.41 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:54:13,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=5110750.0, ans=0.2 2024-08-21 05:54:18,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5110750.0, ans=0.125 2024-08-21 05:54:19,889 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 29 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-21 05:54:47,150 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 32 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-21 05:54:52,374 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-21 05:55:06,489 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 22 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-21 05:55:36,595 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 7350, loss[loss=0.1098, beats_loss=0.01007, ecapa_loss=0.00015, whisper_loss=0.09825, over 22067.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0103, ecapa_loss=0.0001398, whisper_loss=0.08966, over 3797994.12 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:55:43,165 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=4.173e-02 2024-08-21 05:55:44,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=5111250.0, ans=0.2 2024-08-21 05:55:47,825 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 31 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-21 05:55:56,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=5111350.0, ans=0.0 2024-08-21 05:55:59,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=5111350.0, ans=0.0 2024-08-21 05:56:06,859 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2024-08-21 05:56:08,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=5111350.0, ans=0.0 2024-08-21 05:56:21,254 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.05 vs. limit=10.0 2024-08-21 05:56:22,061 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 21 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-21 05:56:25,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5111450.0, ans=0.1 2024-08-21 05:56:27,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5111450.0, ans=0.0 2024-08-21 05:56:33,609 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.293e+01 2.578e+01 2.831e+01 4.096e+01, threshold=5.157e+01, percent-clipped=0.0 2024-08-21 05:57:04,677 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 7400, loss[loss=0.1039, beats_loss=0.008983, ecapa_loss=0.0001546, whisper_loss=0.09336, over 16213.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01021, ecapa_loss=0.0001392, whisper_loss=0.09022, over 3791679.76 frames. ], batch size: 66, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:57:17,957 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.46 vs. limit=12.0 2024-08-21 05:57:37,527 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 28 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-21 05:57:39,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5111850.0, ans=0.1 2024-08-21 05:57:40,928 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08977154642343521, model_norm_threshold=51.56612014770508 2024-08-21 05:57:41,097 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.273e+04, grad_sumsq=4.273e+04, orig_rms_sq=1.000e+00 2024-08-21 05:57:55,056 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2024-08-21 05:58:03,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5112050.0, ans=0.0 2024-08-21 05:58:22,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5112150.0, ans=0.125 2024-08-21 05:58:25,286 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 37 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-21 05:58:34,087 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 7450, loss[loss=0.1168, beats_loss=0.01015, ecapa_loss=0.0001395, whisper_loss=0.1052, over 21182.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01019, ecapa_loss=0.0001391, whisper_loss=0.09089, over 3789182.44 frames. ], batch size: 87, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:58:34,259 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 23 from LS+wenet, 6 from Vox, 27 fro AS 2024-08-21 05:58:37,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5112250.0, ans=0.125 2024-08-21 05:58:40,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=5112250.0, ans=0.125 2024-08-21 05:58:40,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=5112250.0, ans=0.0 2024-08-21 05:58:47,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5112250.0, ans=0.125 2024-08-21 05:58:54,039 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 26 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-21 05:59:06,604 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-21 05:59:11,952 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 20 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-21 05:59:31,715 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.315e+01 2.613e+01 3.029e+01 5.744e+02, threshold=5.226e+01, percent-clipped=1.0 2024-08-21 05:59:57,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5112650.0, ans=0.1 2024-08-21 05:59:57,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5112650.0, ans=0.0 2024-08-21 06:00:03,586 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 7500, loss[loss=0.1021, beats_loss=0.01074, ecapa_loss=0.0001156, whisper_loss=0.09023, over 19069.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01028, ecapa_loss=0.0001383, whisper_loss=0.09058, over 3812673.89 frames. ], batch size: 73, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:00:08,234 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.07 vs. limit=22.5 2024-08-21 06:00:13,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5112750.0, ans=0.0 2024-08-21 06:00:48,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5112950.0, ans=0.125 2024-08-21 06:01:00,779 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.16 vs. limit=15.0 2024-08-21 06:01:34,461 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 7550, loss[loss=0.107, beats_loss=0.008308, ecapa_loss=0.0001513, whisper_loss=0.09714, over 16225.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01033, ecapa_loss=0.000139, whisper_loss=0.08984, over 3789669.32 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 1.152921504606847e+18 2024-08-21 06:01:40,989 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 15 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-21 06:01:48,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=5113250.0, ans=0.0 2024-08-21 06:02:02,087 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.35 vs. limit=22.5 2024-08-21 06:02:06,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5113350.0, ans=0.1 2024-08-21 06:02:20,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=5113450.0, ans=0.2 2024-08-21 06:02:21,394 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 33 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-21 06:02:36,285 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.241e+01 2.500e+01 2.791e+01 3.634e+01, threshold=5.000e+01, percent-clipped=0.0 2024-08-21 06:02:37,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=5113550.0, ans=0.05 2024-08-21 06:02:57,938 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.78 vs. limit=15.0 2024-08-21 06:03:07,992 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 7600, loss[loss=0.1149, beats_loss=0.008461, ecapa_loss=0.0001716, whisper_loss=0.1048, over 23030.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01032, ecapa_loss=0.00014, whisper_loss=0.09011, over 3807261.96 frames. ], batch size: 92, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:03:11,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5113750.0, ans=0.125 2024-08-21 06:03:26,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5113850.0, ans=0.125 2024-08-21 06:03:45,083 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.95 vs. limit=8.0 2024-08-21 06:03:46,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=5113950.0, ans=0.125 2024-08-21 06:03:46,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5113950.0, ans=0.125 2024-08-21 06:03:52,901 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 29 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-21 06:03:56,699 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.35 vs. limit=15.0 2024-08-21 06:04:10,259 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 27 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-21 06:04:17,020 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 14 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-21 06:04:32,507 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 27 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-21 06:04:41,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5114250.0, ans=0.125 2024-08-21 06:04:42,291 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 7650, loss[loss=0.09977, beats_loss=0.01064, ecapa_loss=0.0001256, whisper_loss=0.08787, over 19047.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01027, ecapa_loss=0.0001405, whisper_loss=0.09023, over 3805292.25 frames. ], batch size: 75, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:04:49,795 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 22 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-21 06:04:57,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5114250.0, ans=0.125 2024-08-21 06:05:18,033 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 20 from LS+wenet, 34 from Vox, 27 fro AS 2024-08-21 06:05:20,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5114450.0, ans=0.1 2024-08-21 06:05:42,514 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.14 vs. limit=15.0 2024-08-21 06:05:42,896 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.284e+01 2.478e+01 2.742e+01 4.351e+01, threshold=4.957e+01, percent-clipped=0.0 2024-08-21 06:06:03,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5114650.0, ans=0.0 2024-08-21 06:06:08,562 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 26 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-21 06:06:13,472 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 7700, loss[loss=0.09112, beats_loss=0.009664, ecapa_loss=0.0001719, whisper_loss=0.07974, over 16675.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01021, ecapa_loss=0.0001416, whisper_loss=0.08981, over 3780689.33 frames. ], batch size: 69, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:06:17,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5114750.0, ans=0.125 2024-08-21 06:06:17,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5114750.0, ans=0.0 2024-08-21 06:06:19,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=5114750.0, ans=0.125 2024-08-21 06:06:19,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5114750.0, ans=0.1 2024-08-21 06:06:33,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5114750.0, ans=0.125 2024-08-21 06:06:33,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5114750.0, ans=0.125 2024-08-21 06:06:44,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5114850.0, ans=0.125 2024-08-21 06:07:05,999 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.33 vs. limit=22.5 2024-08-21 06:07:58,856 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 7750, loss[loss=0.08578, beats_loss=0.01052, ecapa_loss=0.0001461, whisper_loss=0.07379, over 20122.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01023, ecapa_loss=0.0001408, whisper_loss=0.09009, over 3805398.25 frames. ], batch size: 80, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:08:04,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=5115250.0, ans=0.2 2024-08-21 06:08:27,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5115350.0, ans=0.1 2024-08-21 06:08:33,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5115350.0, ans=0.125 2024-08-21 06:08:34,771 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 27 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-21 06:08:42,665 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 35 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-21 06:08:51,378 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.61 vs. limit=15.0 2024-08-21 06:08:52,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=5115450.0, ans=0.125 2024-08-21 06:09:03,080 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.270e+01 2.577e+01 2.902e+01 8.135e+01, threshold=5.155e+01, percent-clipped=1.0 2024-08-21 06:09:13,728 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 24 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-21 06:09:30,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5115650.0, ans=0.125 2024-08-21 06:09:34,807 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 7800, loss[loss=0.1329, beats_loss=0.006442, ecapa_loss=0.0001374, whisper_loss=0.1251, over 18042.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01022, ecapa_loss=0.00014, whisper_loss=0.09026, over 3823091.58 frames. ], batch size: 67, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:09:35,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=5115750.0, ans=0.025 2024-08-21 06:09:38,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=5115750.0, ans=0.5 2024-08-21 06:09:53,526 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 24 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-21 06:10:01,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5115850.0, ans=0.125 2024-08-21 06:10:10,526 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.80 vs. limit=15.0 2024-08-21 06:10:11,175 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 12 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-21 06:10:12,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5115950.0, ans=0.125 2024-08-21 06:10:24,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5115950.0, ans=0.1 2024-08-21 06:10:30,914 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 33 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-21 06:10:57,601 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-21 06:11:10,103 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 7850, loss[loss=0.1261, beats_loss=0.007424, ecapa_loss=0.0001407, whisper_loss=0.1172, over 22268.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01016, ecapa_loss=0.0001396, whisper_loss=0.09051, over 3826012.92 frames. ], batch size: 84, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:11:10,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=5116250.0, ans=0.125 2024-08-21 06:11:29,213 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 20 from LS+wenet, 9 from Vox, 23 fro AS 2024-08-21 06:11:41,887 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 22 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-21 06:11:56,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5116450.0, ans=0.0 2024-08-21 06:12:06,812 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2024-08-21 06:12:08,227 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.92 vs. limit=15.0 2024-08-21 06:12:08,964 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.692e+01 2.271e+01 2.419e+01 2.702e+01 3.999e+01, threshold=4.838e+01, percent-clipped=0.0 2024-08-21 06:12:12,593 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 26 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-21 06:12:19,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5116550.0, ans=0.125 2024-08-21 06:12:21,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5116650.0, ans=0.125 2024-08-21 06:12:33,165 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-21 06:12:41,180 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 7900, loss[loss=0.1154, beats_loss=0.01004, ecapa_loss=0.0001415, whisper_loss=0.104, over 21093.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01026, ecapa_loss=0.0001387, whisper_loss=0.08975, over 3820726.58 frames. ], batch size: 82, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:13:13,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=5116850.0, ans=0.125 2024-08-21 06:13:26,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5116950.0, ans=0.0 2024-08-21 06:13:31,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5116950.0, ans=0.125 2024-08-21 06:13:33,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=5117050.0, ans=0.125 2024-08-21 06:14:05,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5117150.0, ans=0.1 2024-08-21 06:14:10,364 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 7950, loss[loss=0.07955, beats_loss=0.01182, ecapa_loss=0.0001664, whisper_loss=0.06606, over 19376.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01019, ecapa_loss=0.0001386, whisper_loss=0.08998, over 3800960.21 frames. ], batch size: 83, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:14:15,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=5117250.0, ans=0.04949747468305833 2024-08-21 06:14:21,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5117250.0, ans=0.0 2024-08-21 06:14:28,704 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-08-21 06:14:50,762 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-21 06:14:54,219 WARNING [optim.py:496] (3/4) Scaling gradients by 0.07848511636257172, model_norm_threshold=48.37834167480469 2024-08-21 06:14:54,387 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.23, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.905e+04, grad_sumsq=8.269e+06, orig_rms_sq=1.077e-02 2024-08-21 06:15:06,209 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.583e+01 2.240e+01 2.515e+01 2.710e+01 6.164e+02, threshold=5.030e+01, percent-clipped=3.0 2024-08-21 06:15:37,034 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 8000, loss[loss=0.0876, beats_loss=0.01051, ecapa_loss=0.0001446, whisper_loss=0.07564, over 16999.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01021, ecapa_loss=0.0001382, whisper_loss=0.08987, over 3771400.23 frames. ], batch size: 68, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:15:54,575 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.81 vs. limit=15.0 2024-08-21 06:16:02,829 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-21 06:16:05,197 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 26 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-21 06:16:09,905 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.90 vs. limit=22.5 2024-08-21 06:16:19,175 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-21 06:16:21,170 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 15 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-21 06:16:25,736 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 25 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-21 06:16:46,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=5118050.0, ans=0.2 2024-08-21 06:17:04,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5118250.0, ans=0.125 2024-08-21 06:17:05,643 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 8050, loss[loss=0.1129, beats_loss=0.009424, ecapa_loss=0.0001394, whisper_loss=0.102, over 23106.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01019, ecapa_loss=0.0001388, whisper_loss=0.09065, over 3776171.30 frames. ], batch size: 93, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:17:20,482 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.81 vs. limit=22.5 2024-08-21 06:17:36,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5118350.0, ans=0.1 2024-08-21 06:17:36,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=5118350.0, ans=0.1 2024-08-21 06:17:41,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5118450.0, ans=0.125 2024-08-21 06:17:45,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=5118450.0, ans=0.125 2024-08-21 06:17:47,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5118450.0, ans=0.125 2024-08-21 06:17:57,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5118550.0, ans=0.125 2024-08-21 06:18:01,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5118550.0, ans=0.125 2024-08-21 06:18:03,337 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.278e+01 2.668e+01 2.870e+01 8.505e+01, threshold=5.336e+01, percent-clipped=1.0 2024-08-21 06:18:34,979 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 8100, loss[loss=0.094, beats_loss=0.01205, ecapa_loss=0.0001044, whisper_loss=0.0809, over 13110.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01029, ecapa_loss=0.0001388, whisper_loss=0.09044, over 3766877.20 frames. ], batch size: 50, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:18:35,221 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 24 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-21 06:18:35,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=5118750.0, ans=0.2 2024-08-21 06:18:39,130 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.07 vs. limit=22.5 2024-08-21 06:18:39,142 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.88 vs. limit=6.0 2024-08-21 06:18:57,535 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2024-08-21 06:18:58,320 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 13 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-21 06:19:11,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5118950.0, ans=0.125 2024-08-21 06:19:34,691 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 22 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-21 06:19:40,700 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 15 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-21 06:19:48,670 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08069697767496109, model_norm_threshold=53.36321258544922 2024-08-21 06:19:48,832 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.079e+04, grad_sumsq=7.079e+04, orig_rms_sq=1.000e+00 2024-08-21 06:19:59,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5119150.0, ans=0.125 2024-08-21 06:20:02,408 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 8150, loss[loss=0.1181, beats_loss=0.00819, ecapa_loss=0.0001629, whisper_loss=0.1083, over 16656.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01031, ecapa_loss=0.0001375, whisper_loss=0.09029, over 3735551.92 frames. ], batch size: 66, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:20:02,646 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 38 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-21 06:20:08,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=5119250.0, ans=0.0 2024-08-21 06:20:19,863 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 24 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-21 06:20:21,591 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 18 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-21 06:20:30,412 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 21 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-21 06:20:39,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5119450.0, ans=0.0 2024-08-21 06:20:40,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=5119450.0, ans=0.025 2024-08-21 06:20:58,316 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.296e+01 2.463e+01 2.711e+01 6.613e+02, threshold=4.926e+01, percent-clipped=2.0 2024-08-21 06:21:16,181 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 24 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-21 06:21:27,803 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 8200, loss[loss=0.09286, beats_loss=0.01203, ecapa_loss=0.0001609, whisper_loss=0.07922, over 17608.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01036, ecapa_loss=0.000138, whisper_loss=0.08974, over 3747083.79 frames. ], batch size: 74, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:21:44,394 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 26 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-21 06:21:56,037 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 23 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-21 06:22:29,037 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 21 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-21 06:22:38,356 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 21 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-21 06:22:57,646 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 8250, loss[loss=0.09462, beats_loss=0.01131, ecapa_loss=0.0001122, whisper_loss=0.08219, over 13740.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01038, ecapa_loss=0.0001379, whisper_loss=0.08989, over 3783067.36 frames. ], batch size: 52, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:23:12,032 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 16 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-21 06:23:25,070 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-21 06:23:36,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=5120450.0, ans=0.2 2024-08-21 06:23:47,572 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.81 vs. limit=22.5 2024-08-21 06:23:54,415 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.309e+01 2.543e+01 2.823e+01 1.094e+02, threshold=5.085e+01, percent-clipped=1.0 2024-08-21 06:23:54,663 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 26 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-21 06:24:00,423 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 21 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-21 06:24:17,605 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 15 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-21 06:24:19,573 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 17 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-21 06:24:25,026 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 8300, loss[loss=0.08599, beats_loss=0.009091, ecapa_loss=0.0001406, whisper_loss=0.07549, over 16620.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01035, ecapa_loss=0.0001369, whisper_loss=0.08986, over 3786567.22 frames. ], batch size: 65, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:24:28,443 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.62 vs. limit=15.0 2024-08-21 06:24:41,631 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.45 vs. limit=15.0 2024-08-21 06:24:55,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5120850.0, ans=0.125 2024-08-21 06:24:57,500 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 39 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-21 06:25:01,439 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 22 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-21 06:25:06,792 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 23 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-21 06:25:10,512 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 11 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-21 06:25:14,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=5120950.0, ans=0.125 2024-08-21 06:25:35,603 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.99 vs. limit=22.5 2024-08-21 06:25:39,237 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.86 vs. limit=22.5 2024-08-21 06:25:51,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=5121150.0, ans=0.125 2024-08-21 06:25:56,026 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 8350, loss[loss=0.08127, beats_loss=0.009855, ecapa_loss=0.0001381, whisper_loss=0.07003, over 16361.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01031, ecapa_loss=0.0001383, whisper_loss=0.08974, over 3795323.77 frames. ], batch size: 66, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:26:20,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5121350.0, ans=0.1 2024-08-21 06:26:22,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=5121350.0, ans=0.125 2024-08-21 06:26:24,970 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 06:26:42,110 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 23 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-21 06:26:52,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=5121550.0, ans=0.125 2024-08-21 06:26:55,614 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0374552384018898, model_norm_threshold=50.851959228515625 2024-08-21 06:26:55,781 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.895e+05, grad_sumsq=1.757e+07, orig_rms_sq=1.078e-02 2024-08-21 06:26:56,307 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 15 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-21 06:26:59,724 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.288e+01 2.464e+01 2.782e+01 1.358e+03, threshold=4.928e+01, percent-clipped=2.0 2024-08-21 06:27:01,848 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 26 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-21 06:27:21,732 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 21 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-21 06:27:33,209 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 8400, loss[loss=0.1052, beats_loss=0.01033, ecapa_loss=0.0001546, whisper_loss=0.09333, over 18891.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01039, ecapa_loss=0.0001383, whisper_loss=0.08954, over 3798737.55 frames. ], batch size: 78, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:27:45,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=5121750.0, ans=0.0 2024-08-21 06:27:45,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5121750.0, ans=0.1 2024-08-21 06:27:45,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5121750.0, ans=0.125 2024-08-21 06:27:52,473 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 21 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-21 06:28:24,971 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 30 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-21 06:28:52,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5122150.0, ans=0.125 2024-08-21 06:29:01,491 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-21 06:29:06,827 WARNING [optim.py:496] (3/4) Scaling gradients by 0.038237348198890686, model_norm_threshold=49.277313232421875 2024-08-21 06:29:06,995 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.0.norm.log_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.805e+05, grad_sumsq=1.805e+05, orig_rms_sq=1.000e+00 2024-08-21 06:29:07,039 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 8450, loss[loss=0.08363, beats_loss=0.01075, ecapa_loss=0.0001274, whisper_loss=0.07161, over 23098.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01037, ecapa_loss=0.0001379, whisper_loss=0.08956, over 3801327.89 frames. ], batch size: 93, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:29:09,002 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-21 06:29:18,024 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.439e-01 2024-08-21 06:29:48,228 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.89 vs. limit=22.5 2024-08-21 06:29:51,243 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.888e+01 2024-08-21 06:29:59,644 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.29 vs. limit=15.0 2024-08-21 06:30:11,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=5122550.0, ans=0.0 2024-08-21 06:30:11,891 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.430e+01 2.632e+01 3.117e+01 1.289e+03, threshold=5.264e+01, percent-clipped=4.0 2024-08-21 06:30:17,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=5122550.0, ans=0.125 2024-08-21 06:30:21,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5122550.0, ans=0.125 2024-08-21 06:30:46,101 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 8500, loss[loss=0.1135, beats_loss=0.01054, ecapa_loss=0.0001541, whisper_loss=0.1014, over 17208.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01034, ecapa_loss=0.0001382, whisper_loss=0.08983, over 3805617.70 frames. ], batch size: 69, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:30:58,322 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-21 06:31:09,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=5122850.0, ans=0.1 2024-08-21 06:31:15,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5122850.0, ans=0.1 2024-08-21 06:31:17,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=5122850.0, ans=0.2 2024-08-21 06:31:18,082 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 26 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-21 06:31:24,489 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.79 vs. limit=15.0 2024-08-21 06:31:27,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5122950.0, ans=0.0 2024-08-21 06:31:38,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=5122950.0, ans=0.125 2024-08-21 06:31:43,928 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 26 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-21 06:31:49,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=5123050.0, ans=0.0 2024-08-21 06:32:06,951 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.64 vs. limit=15.0 2024-08-21 06:32:10,234 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.16 vs. limit=22.5 2024-08-21 06:32:10,284 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.30 vs. limit=22.5 2024-08-21 06:32:24,512 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 8550, loss[loss=0.1003, beats_loss=0.00987, ecapa_loss=0.0001203, whisper_loss=0.0892, over 20121.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01028, ecapa_loss=0.0001382, whisper_loss=0.09013, over 3769550.47 frames. ], batch size: 78, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:32:28,849 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 27 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-21 06:33:10,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5123450.0, ans=0.125 2024-08-21 06:33:30,485 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.375e+01 2.634e+01 2.950e+01 1.431e+02, threshold=5.267e+01, percent-clipped=1.0 2024-08-21 06:33:35,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5123550.0, ans=0.0 2024-08-21 06:33:46,264 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 27 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-21 06:33:51,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=5123650.0, ans=0.125 2024-08-21 06:34:04,713 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 8600, loss[loss=0.09375, beats_loss=0.01162, ecapa_loss=0.0001725, whisper_loss=0.08041, over 22219.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01023, ecapa_loss=0.000139, whisper_loss=0.09114, over 3776004.83 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:34:08,498 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 30 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-21 06:34:11,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=5123750.0, ans=0.125 2024-08-21 06:34:42,887 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 27 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-21 06:34:47,075 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 32 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-21 06:34:50,435 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 34 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-21 06:34:53,669 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-21 06:35:14,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.55 vs. limit=22.5 2024-08-21 06:35:23,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=5124150.0, ans=0.125 2024-08-21 06:35:27,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5124150.0, ans=0.0 2024-08-21 06:35:30,162 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 16 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-21 06:35:38,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5124150.0, ans=0.1 2024-08-21 06:35:43,143 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 8650, loss[loss=0.1064, beats_loss=0.008279, ecapa_loss=0.0001459, whisper_loss=0.09662, over 21064.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0102, ecapa_loss=0.0001384, whisper_loss=0.09127, over 3807718.38 frames. ], batch size: 83, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:35:51,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=5124250.0, ans=0.0 2024-08-21 06:35:51,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5124250.0, ans=0.0 2024-08-21 06:35:51,387 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.62 vs. limit=22.5 2024-08-21 06:36:03,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=5124350.0, ans=0.0 2024-08-21 06:36:25,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5124450.0, ans=0.0 2024-08-21 06:36:45,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5124550.0, ans=0.0 2024-08-21 06:36:46,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=5124550.0, ans=0.125 2024-08-21 06:36:47,625 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.320e+01 2.649e+01 2.917e+01 4.406e+01, threshold=5.297e+01, percent-clipped=0.0 2024-08-21 06:37:05,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=5124650.0, ans=0.2 2024-08-21 06:37:12,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5124650.0, ans=0.125 2024-08-21 06:37:14,953 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.61 vs. limit=12.0 2024-08-21 06:37:21,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=5124650.0, ans=0.125 2024-08-21 06:37:24,442 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 8700, loss[loss=0.1034, beats_loss=0.00881, ecapa_loss=0.0001521, whisper_loss=0.09308, over 16445.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01025, ecapa_loss=0.0001382, whisper_loss=0.09083, over 3815867.76 frames. ], batch size: 67, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:37:28,851 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 34 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-21 06:38:07,634 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 37 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-21 06:38:45,953 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 24 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-21 06:38:49,958 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 19 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-21 06:39:00,041 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 8750, loss[loss=0.1127, beats_loss=0.007787, ecapa_loss=0.0001554, whisper_loss=0.1033, over 19160.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01025, ecapa_loss=0.000138, whisper_loss=0.09071, over 3809071.68 frames. ], batch size: 76, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:39:10,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5125250.0, ans=0.125 2024-08-21 06:39:14,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5125250.0, ans=0.125 2024-08-21 06:39:19,595 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.89 vs. limit=22.5 2024-08-21 06:39:25,908 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 26 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-21 06:39:29,889 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 24 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-21 06:39:35,171 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 15 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-21 06:39:39,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=5125450.0, ans=0.125 2024-08-21 06:39:50,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5125450.0, ans=0.125 2024-08-21 06:39:52,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5125450.0, ans=0.125 2024-08-21 06:40:01,936 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.240e+01 2.524e+01 2.807e+01 1.444e+02, threshold=5.048e+01, percent-clipped=1.0 2024-08-21 06:40:06,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=5125550.0, ans=0.035 2024-08-21 06:40:19,304 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 06:40:26,441 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 06:40:34,713 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 8800, loss[loss=0.1068, beats_loss=0.009113, ecapa_loss=0.0001417, whisper_loss=0.09629, over 22214.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01028, ecapa_loss=0.0001368, whisper_loss=0.09019, over 3792660.01 frames. ], batch size: 88, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:40:40,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5125750.0, ans=0.125 2024-08-21 06:40:49,457 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 27 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-21 06:40:51,528 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 18 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-21 06:40:52,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=5125850.0, ans=0.125 2024-08-21 06:40:56,931 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 17 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-21 06:41:01,286 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.64 vs. limit=15.0 2024-08-21 06:41:05,555 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 21 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-21 06:41:07,465 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 24 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-21 06:41:11,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=5125950.0, ans=0.125 2024-08-21 06:41:31,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5126050.0, ans=0.0 2024-08-21 06:41:35,135 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 18 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-21 06:41:35,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5126050.0, ans=0.125 2024-08-21 06:41:38,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5126050.0, ans=0.125 2024-08-21 06:41:59,655 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 31 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-21 06:42:04,617 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 8850, loss[loss=0.09674, beats_loss=0.01262, ecapa_loss=0.000106, whisper_loss=0.08306, over 22540.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01029, ecapa_loss=0.0001372, whisper_loss=0.08989, over 3752074.24 frames. ], batch size: 89, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:42:05,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=5126250.0, ans=0.2 2024-08-21 06:42:05,798 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 06:42:09,756 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.99 vs. limit=15.0 2024-08-21 06:42:11,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5126250.0, ans=0.125 2024-08-21 06:42:15,078 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-21 06:42:18,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5126250.0, ans=0.125 2024-08-21 06:42:19,491 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 19 from LS+wenet, 9 from Vox, 31 fro AS 2024-08-21 06:42:31,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5126350.0, ans=0.125 2024-08-21 06:42:50,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=5126450.0, ans=0.125 2024-08-21 06:42:51,365 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 23 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-21 06:42:59,100 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 19 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-21 06:43:02,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5126450.0, ans=0.1 2024-08-21 06:43:11,602 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.228e+01 2.528e+01 2.793e+01 5.836e+01, threshold=5.056e+01, percent-clipped=1.0 2024-08-21 06:43:18,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5126550.0, ans=0.0 2024-08-21 06:43:21,607 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 25 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-21 06:43:44,105 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 17 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-21 06:43:45,280 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.33 vs. limit=22.5 2024-08-21 06:43:45,884 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 8900, loss[loss=0.08318, beats_loss=0.01116, ecapa_loss=0.0001326, whisper_loss=0.07069, over 15892.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01034, ecapa_loss=0.0001376, whisper_loss=0.08993, over 3755014.80 frames. ], batch size: 65, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:44:07,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=5126850.0, ans=0.0 2024-08-21 06:44:24,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5126950.0, ans=0.125 2024-08-21 06:44:29,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5126950.0, ans=0.0 2024-08-21 06:44:43,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5127050.0, ans=0.125 2024-08-21 06:44:45,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5127050.0, ans=0.125 2024-08-21 06:44:48,605 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2024-08-21 06:44:57,206 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2024-08-21 06:45:11,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=5127150.0, ans=0.2 2024-08-21 06:45:16,182 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-08-21 06:45:18,665 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 8950, loss[loss=0.09991, beats_loss=0.01055, ecapa_loss=0.0001222, whisper_loss=0.08813, over 20959.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01029, ecapa_loss=0.0001381, whisper_loss=0.09002, over 3776126.25 frames. ], batch size: 80, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:45:18,911 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 26 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-21 06:45:38,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5127350.0, ans=0.125 2024-08-21 06:46:02,961 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 26 from LS+wenet, 32 from Vox, 35 fro AS 2024-08-21 06:46:11,296 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 28 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-21 06:46:22,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=5127550.0, ans=0.025 2024-08-21 06:46:23,664 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.641e+01 2.262e+01 2.428e+01 2.785e+01 3.880e+01, threshold=4.857e+01, percent-clipped=0.0 2024-08-21 06:46:35,663 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-21 06:46:43,638 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.57 vs. limit=15.0 2024-08-21 06:46:46,649 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 21 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-21 06:46:53,322 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 24 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-21 06:46:56,918 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 9000, loss[loss=0.1226, beats_loss=0.01077, ecapa_loss=0.0001184, whisper_loss=0.1106, over 23633.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01037, ecapa_loss=0.000139, whisper_loss=0.09008, over 3798259.74 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:46:56,918 INFO [train_multi_KD3.py:1140] (3/4) Computing validation loss 2024-08-21 06:47:34,701 INFO [train_multi_KD3.py:1150] (3/4) Epoch 35, validation on ASR_libri: loss=0.2538, beats_loss=0, ecapa_loss=0.0005065, whisper_loss=0.2487, over 931116.00 frames. 2024-08-21 06:47:57,429 INFO [train_multi_KD3.py:1150] (3/4) Epoch 35, validation on SV_voxceleb1: loss=0.003886, beats_loss=0, ecapa_loss=0.0003886, whisper_loss=0, over 944235.00 frames. 2024-08-21 06:49:39,589 INFO [train_multi_KD3.py:1150] (3/4) Epoch 35, validation on AT_audioset: loss=0.02296, beats_loss=0.02296, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-21 06:49:39,592 INFO [train_multi_KD3.py:1156] (3/4) Maximum memory allocated so far is 32606MB 2024-08-21 06:49:41,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=5127750.0, ans=0.125 2024-08-21 06:49:44,526 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-21 06:50:00,641 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-21 06:50:00,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=5127850.0, ans=0.015 2024-08-21 06:50:09,668 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 13 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-21 06:50:18,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=5127950.0, ans=0.0 2024-08-21 06:51:08,177 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 9050, loss[loss=0.1202, beats_loss=0.008907, ecapa_loss=0.000162, whisper_loss=0.1097, over 23259.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01039, ecapa_loss=0.0001377, whisper_loss=0.08979, over 3800073.67 frames. ], batch size: 92, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:51:10,176 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-21 06:51:14,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5128250.0, ans=0.1 2024-08-21 06:51:54,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5128450.0, ans=0.125 2024-08-21 06:52:11,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5128550.0, ans=0.1 2024-08-21 06:52:12,253 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.208e+01 2.419e+01 2.777e+01 1.932e+02, threshold=4.839e+01, percent-clipped=1.0 2024-08-21 06:52:12,461 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 26 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-21 06:52:19,796 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 21 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-21 06:52:27,770 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 15 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-21 06:52:33,320 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-21 06:52:41,196 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.16 vs. limit=22.5 2024-08-21 06:52:41,763 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 9100, loss[loss=0.1205, beats_loss=0.009077, ecapa_loss=0.0001297, whisper_loss=0.1102, over 15483.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01031, ecapa_loss=0.0001387, whisper_loss=0.09024, over 3769687.18 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:52:43,695 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 20 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-21 06:52:48,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=5128750.0, ans=0.125 2024-08-21 06:52:53,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5128750.0, ans=0.1 2024-08-21 06:53:02,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5128850.0, ans=0.1 2024-08-21 06:53:04,867 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.69 vs. limit=15.0 2024-08-21 06:53:40,905 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 16 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-21 06:53:51,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5129050.0, ans=0.125 2024-08-21 06:53:58,589 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 31 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-21 06:54:09,187 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2024-08-21 06:54:15,658 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 9150, loss[loss=0.1199, beats_loss=0.006425, ecapa_loss=0.0001493, whisper_loss=0.112, over 13612.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01043, ecapa_loss=0.000139, whisper_loss=0.08976, over 3775413.33 frames. ], batch size: 51, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:54:44,337 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 36 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-21 06:54:47,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5129350.0, ans=0.125 2024-08-21 06:54:47,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5129350.0, ans=0.125 2024-08-21 06:54:48,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=5129350.0, ans=0.125 2024-08-21 06:55:15,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5129550.0, ans=0.0 2024-08-21 06:55:16,375 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.222e+01 2.469e+01 2.826e+01 4.057e+01, threshold=4.939e+01, percent-clipped=0.0 2024-08-21 06:55:46,856 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 9200, loss[loss=0.09943, beats_loss=0.0101, ecapa_loss=0.0001609, whisper_loss=0.08772, over 16976.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01039, ecapa_loss=0.0001391, whisper_loss=0.0905, over 3792334.64 frames. ], batch size: 71, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:55:57,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=5129750.0, ans=0.125 2024-08-21 06:55:57,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5129750.0, ans=0.0 2024-08-21 06:56:14,509 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.69 vs. limit=12.0 2024-08-21 06:56:28,513 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 17 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-21 06:56:57,442 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 29 from LS+wenet, 13 from Vox, 47 fro AS 2024-08-21 06:57:03,880 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.51 vs. limit=22.5 2024-08-21 06:57:11,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=5130150.0, ans=0.125 2024-08-21 06:57:22,057 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 9250, loss[loss=0.1106, beats_loss=0.009217, ecapa_loss=0.00013, whisper_loss=0.1001, over 14849.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01037, ecapa_loss=0.0001388, whisper_loss=0.09051, over 3781715.56 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:57:44,847 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 14 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-21 06:57:55,027 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 06:57:56,888 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2024-08-21 06:58:13,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5130450.0, ans=0.125 2024-08-21 06:58:17,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=5130550.0, ans=15.0 2024-08-21 06:58:17,925 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 34 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-21 06:58:23,654 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.254e+01 2.574e+01 2.946e+01 4.918e+02, threshold=5.149e+01, percent-clipped=3.0 2024-08-21 06:58:26,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=5130550.0, ans=0.125 2024-08-21 06:58:33,872 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 33 from LS+wenet, 14 from Vox, 45 fro AS 2024-08-21 06:58:54,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5130650.0, ans=0.1 2024-08-21 06:58:58,623 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 9300, loss[loss=0.08962, beats_loss=0.01207, ecapa_loss=0.0001304, whisper_loss=0.07624, over 22633.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0104, ecapa_loss=0.0001396, whisper_loss=0.09026, over 3827248.47 frames. ], batch size: 94, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:59:06,581 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 28 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-21 06:59:11,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5130750.0, ans=0.125 2024-08-21 06:59:11,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=5130750.0, ans=0.2 2024-08-21 06:59:36,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=5130950.0, ans=0.125 2024-08-21 06:59:44,165 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.646e+00 2024-08-21 06:59:48,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5130950.0, ans=0.125 2024-08-21 07:00:00,153 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.30 vs. limit=6.0 2024-08-21 07:00:03,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5131050.0, ans=0.1 2024-08-21 07:00:06,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=5131050.0, ans=0.2 2024-08-21 07:00:11,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=5131050.0, ans=0.0 2024-08-21 07:00:16,454 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2024-08-21 07:00:23,535 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 14 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-21 07:00:33,904 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 9350, loss[loss=0.1056, beats_loss=0.009628, ecapa_loss=0.0001302, whisper_loss=0.09469, over 18304.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01039, ecapa_loss=0.0001384, whisper_loss=0.09064, over 3843473.82 frames. ], batch size: 69, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:00:34,156 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 36 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-21 07:00:39,980 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 28 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-21 07:00:42,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5131250.0, ans=0.0 2024-08-21 07:01:35,567 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.276e+01 2.548e+01 2.859e+01 2.021e+02, threshold=5.096e+01, percent-clipped=1.0 2024-08-21 07:01:49,601 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.424e+00 2024-08-21 07:01:51,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5131650.0, ans=0.125 2024-08-21 07:02:07,481 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 9400, loss[loss=0.0834, beats_loss=0.01134, ecapa_loss=0.0001357, whisper_loss=0.0707, over 17326.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01033, ecapa_loss=0.000139, whisper_loss=0.09045, over 3830441.22 frames. ], batch size: 71, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:02:12,890 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 33 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-21 07:02:28,048 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 22 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-21 07:02:31,684 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 25 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-21 07:02:44,778 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 14 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-21 07:02:47,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=5131950.0, ans=0.0 2024-08-21 07:02:48,581 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 16 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-21 07:02:49,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5131950.0, ans=0.125 2024-08-21 07:03:10,286 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 21 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-21 07:03:40,016 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 9450, loss[loss=0.1169, beats_loss=0.008726, ecapa_loss=0.0001152, whisper_loss=0.107, over 18857.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01025, ecapa_loss=0.0001384, whisper_loss=0.09107, over 3812188.75 frames. ], batch size: 70, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:03:59,745 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.58 vs. limit=22.5 2024-08-21 07:04:04,423 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 26 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-21 07:04:07,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=5132350.0, ans=0.0 2024-08-21 07:04:39,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5132550.0, ans=0.0 2024-08-21 07:04:40,569 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.680e+01 2.244e+01 2.507e+01 2.864e+01 1.489e+02, threshold=5.014e+01, percent-clipped=2.0 2024-08-21 07:04:40,822 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 19 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-21 07:05:13,303 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 9500, loss[loss=0.08601, beats_loss=0.008997, ecapa_loss=0.0001729, whisper_loss=0.07529, over 16075.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01022, ecapa_loss=0.0001384, whisper_loss=0.09147, over 3829643.76 frames. ], batch size: 66, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:05:32,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=5132850.0, ans=0.0 2024-08-21 07:05:37,553 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 24 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-21 07:05:42,998 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 23 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-21 07:05:47,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5132850.0, ans=0.0 2024-08-21 07:05:53,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=5132950.0, ans=0.0 2024-08-21 07:05:55,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5132950.0, ans=0.125 2024-08-21 07:06:28,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5133150.0, ans=0.1 2024-08-21 07:06:47,324 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 9550, loss[loss=0.08192, beats_loss=0.01177, ecapa_loss=0.0001188, whisper_loss=0.06897, over 15899.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01024, ecapa_loss=0.0001376, whisper_loss=0.09096, over 3836941.42 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:06:57,077 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.779e-02 2024-08-21 07:06:57,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=5133250.0, ans=0.125 2024-08-21 07:07:04,558 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 26 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-21 07:07:16,304 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 23 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-21 07:07:26,312 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 24 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-21 07:07:29,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=5133450.0, ans=0.125 2024-08-21 07:07:32,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=5133450.0, ans=0.125 2024-08-21 07:07:43,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5133550.0, ans=0.125 2024-08-21 07:07:49,699 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.309e+01 2.529e+01 2.824e+01 3.800e+01, threshold=5.057e+01, percent-clipped=0.0 2024-08-21 07:07:53,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5133550.0, ans=0.0 2024-08-21 07:07:54,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5133550.0, ans=0.0 2024-08-21 07:08:19,678 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 9600, loss[loss=0.0937, beats_loss=0.01193, ecapa_loss=0.0001139, whisper_loss=0.08062, over 19223.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01026, ecapa_loss=0.0001376, whisper_loss=0.09046, over 3829175.38 frames. ], batch size: 77, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:08:22,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=5133750.0, ans=0.0 2024-08-21 07:08:50,329 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.67 vs. limit=12.0 2024-08-21 07:08:56,241 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-21 07:08:59,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=5133950.0, ans=0.125 2024-08-21 07:09:07,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=5133950.0, ans=0.125 2024-08-21 07:09:24,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5134050.0, ans=0.125 2024-08-21 07:09:38,255 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 17 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-21 07:09:48,712 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 9650, loss[loss=0.1076, beats_loss=0.009716, ecapa_loss=0.0001329, whisper_loss=0.09655, over 18280.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01033, ecapa_loss=0.0001385, whisper_loss=0.09007, over 3850345.67 frames. ], batch size: 71, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:09:49,737 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2024-08-21 07:10:04,857 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 18 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-21 07:10:21,232 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-21 07:10:24,969 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 19 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-21 07:10:49,970 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.246e+01 2.506e+01 2.851e+01 2.599e+02, threshold=5.012e+01, percent-clipped=4.0 2024-08-21 07:10:57,333 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-21 07:11:01,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5134650.0, ans=0.125 2024-08-21 07:11:12,421 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 16 from LS+wenet, 5 from Vox, 29 fro AS 2024-08-21 07:11:19,417 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 9700, loss[loss=0.08567, beats_loss=0.01433, ecapa_loss=0.000102, whisper_loss=0.07032, over 21182.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0104, ecapa_loss=0.0001375, whisper_loss=0.08955, over 3843362.25 frames. ], batch size: 84, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:11:46,620 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-21 07:11:56,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5134950.0, ans=0.125 2024-08-21 07:11:58,238 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 23 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-21 07:11:59,514 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 24 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-21 07:12:21,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=5135050.0, ans=0.125 2024-08-21 07:12:26,608 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 28 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-21 07:12:50,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5135250.0, ans=0.125 2024-08-21 07:12:50,850 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 9750, loss[loss=0.1086, beats_loss=0.009856, ecapa_loss=0.0001277, whisper_loss=0.09749, over 23574.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01043, ecapa_loss=0.0001373, whisper_loss=0.08964, over 3851986.69 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:12:53,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5135250.0, ans=0.125 2024-08-21 07:13:02,121 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 28 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-21 07:13:04,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5135250.0, ans=0.1 2024-08-21 07:13:07,729 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.41 vs. limit=15.0 2024-08-21 07:13:08,976 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-21 07:13:19,256 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 22 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-21 07:13:23,372 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-08-21 07:13:32,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=5135450.0, ans=0.04949747468305833 2024-08-21 07:13:36,154 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.77 vs. limit=15.0 2024-08-21 07:13:47,633 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 31 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-21 07:13:48,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=5135550.0, ans=0.125 2024-08-21 07:13:49,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=5135550.0, ans=0.2 2024-08-21 07:13:52,640 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.258e+01 2.458e+01 2.685e+01 1.396e+02, threshold=4.917e+01, percent-clipped=1.0 2024-08-21 07:13:58,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=5135550.0, ans=10.0 2024-08-21 07:14:06,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=5135650.0, ans=0.125 2024-08-21 07:14:15,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5135650.0, ans=0.0 2024-08-21 07:14:20,952 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 9800, loss[loss=0.08064, beats_loss=0.01441, ecapa_loss=0.0001066, whisper_loss=0.06516, over 15021.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01042, ecapa_loss=0.0001372, whisper_loss=0.09003, over 3880233.86 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:14:29,942 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.05 vs. limit=15.0 2024-08-21 07:14:46,206 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 32 from LS+wenet, 33 from Vox, 29 fro AS 2024-08-21 07:14:47,917 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 35 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-21 07:15:05,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5135950.0, ans=0.0 2024-08-21 07:15:07,952 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 26 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-21 07:15:09,783 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 28 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-21 07:15:16,605 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 30 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-21 07:15:24,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5136050.0, ans=0.125 2024-08-21 07:15:38,169 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 22 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-21 07:15:41,855 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-21 07:15:54,760 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 9850, loss[loss=0.1054, beats_loss=0.009067, ecapa_loss=0.0001953, whisper_loss=0.09434, over 16499.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01038, ecapa_loss=0.0001385, whisper_loss=0.08988, over 3852881.65 frames. ], batch size: 68, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:16:12,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5136250.0, ans=0.1 2024-08-21 07:16:13,277 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 34 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-21 07:17:00,083 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2024-08-21 07:17:00,769 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.250e+01 2.454e+01 2.726e+01 7.431e+01, threshold=4.908e+01, percent-clipped=3.0 2024-08-21 07:17:03,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=5136550.0, ans=0.125 2024-08-21 07:17:23,265 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-21 07:17:33,747 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 9900, loss[loss=0.09114, beats_loss=0.01147, ecapa_loss=0.0001007, whisper_loss=0.07866, over 15887.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0104, ecapa_loss=0.0001378, whisper_loss=0.08998, over 3853130.05 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:17:39,367 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-21 07:17:52,071 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.33 vs. limit=22.5 2024-08-21 07:18:12,236 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 20 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-21 07:18:16,682 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.87 vs. limit=6.0 2024-08-21 07:18:44,073 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.20 vs. limit=10.0 2024-08-21 07:18:45,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=5137050.0, ans=0.2 2024-08-21 07:18:51,492 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.92 vs. limit=6.0 2024-08-21 07:18:58,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5137150.0, ans=0.1 2024-08-21 07:18:59,948 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 22 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-21 07:19:07,790 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 9950, loss[loss=0.1037, beats_loss=0.01187, ecapa_loss=0.0001484, whisper_loss=0.09032, over 22953.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01043, ecapa_loss=0.0001384, whisper_loss=0.08961, over 3817690.75 frames. ], batch size: 93, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:19:15,222 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 34 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-21 07:20:11,421 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.331e+01 2.493e+01 2.737e+01 3.742e+01, threshold=4.986e+01, percent-clipped=0.0 2024-08-21 07:20:17,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=5137550.0, ans=0.025 2024-08-21 07:20:20,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=5137650.0, ans=0.125 2024-08-21 07:20:24,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5137650.0, ans=0.125 2024-08-21 07:20:24,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=5137650.0, ans=0.04949747468305833 2024-08-21 07:20:26,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=5137650.0, ans=0.125 2024-08-21 07:20:27,591 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 19 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-21 07:20:38,547 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 14 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-21 07:20:40,318 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 10000, loss[loss=0.08138, beats_loss=0.0116, ecapa_loss=0.0001387, whisper_loss=0.0684, over 15714.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01047, ecapa_loss=0.0001385, whisper_loss=0.0892, over 3791355.45 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:21:11,868 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 25 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-21 07:21:15,205 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=127.05 vs. limit=15.0 2024-08-21 07:21:20,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=5137950.0, ans=0.04949747468305833 2024-08-21 07:21:22,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5137950.0, ans=0.125 2024-08-21 07:21:33,472 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 30 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-21 07:21:36,640 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2024-08-21 07:21:42,928 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 14 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-21 07:22:09,941 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.61 vs. limit=15.0 2024-08-21 07:22:14,128 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.82 vs. limit=22.5 2024-08-21 07:22:14,657 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 10050, loss[loss=0.101, beats_loss=0.009259, ecapa_loss=0.0001485, whisper_loss=0.09026, over 19313.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01045, ecapa_loss=0.0001382, whisper_loss=0.09, over 3818882.14 frames. ], batch size: 77, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:22:19,585 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-21 07:22:30,643 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 26 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-21 07:22:32,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5138250.0, ans=0.1 2024-08-21 07:22:36,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=5138350.0, ans=0.2 2024-08-21 07:22:39,381 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.84 vs. limit=12.0 2024-08-21 07:22:42,019 WARNING [optim.py:496] (3/4) Scaling gradients by 0.01775754615664482, model_norm_threshold=49.858680725097656 2024-08-21 07:22:42,186 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.23, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.817e+06, grad_sumsq=1.684e+08, orig_rms_sq=1.079e-02 2024-08-21 07:22:45,504 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2024-08-21 07:22:54,283 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 15 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-21 07:23:02,399 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 20 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-21 07:23:16,467 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 21 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-21 07:23:27,121 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.254e+01 2.564e+01 3.028e+01 2.808e+03, threshold=5.129e+01, percent-clipped=1.0 2024-08-21 07:23:27,388 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 20 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-21 07:23:30,503 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.36 vs. limit=12.0 2024-08-21 07:23:41,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=5138650.0, ans=0.05 2024-08-21 07:23:46,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5138650.0, ans=0.1 2024-08-21 07:24:00,788 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 19 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-21 07:24:02,644 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 10100, loss[loss=0.1018, beats_loss=0.0109, ecapa_loss=0.0001504, whisper_loss=0.08941, over 15881.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01048, ecapa_loss=0.000138, whisper_loss=0.08969, over 3818620.95 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:24:03,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5138750.0, ans=0.1 2024-08-21 07:24:23,262 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 18 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-21 07:24:27,245 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 21 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-21 07:24:29,589 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 27 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-21 07:24:45,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=5138950.0, ans=0.125 2024-08-21 07:24:55,115 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.05 vs. limit=22.5 2024-08-21 07:25:05,683 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-21 07:25:07,361 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 25 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-21 07:25:08,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5139050.0, ans=0.0 2024-08-21 07:25:35,254 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 37 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-21 07:25:36,848 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 10150, loss[loss=0.1284, beats_loss=0.006789, ecapa_loss=0.0001767, whisper_loss=0.1198, over 22277.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01042, ecapa_loss=0.0001387, whisper_loss=0.08991, over 3771954.21 frames. ], batch size: 87, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:26:05,411 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 32 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-21 07:26:29,102 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 22 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-21 07:26:44,482 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.296e+01 2.505e+01 2.874e+01 3.996e+01, threshold=5.010e+01, percent-clipped=0.0 2024-08-21 07:26:50,808 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 37 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-21 07:26:55,101 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 17 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-21 07:27:15,374 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 10200, loss[loss=0.0928, beats_loss=0.009674, ecapa_loss=0.0001515, whisper_loss=0.08161, over 17084.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01043, ecapa_loss=0.0001394, whisper_loss=0.09001, over 3799210.01 frames. ], batch size: 70, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:27:21,433 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-21 07:27:46,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5139850.0, ans=0.125 2024-08-21 07:28:08,203 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=12.0 2024-08-21 07:28:12,591 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 25 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-21 07:28:17,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=5140050.0, ans=0.5 2024-08-21 07:28:27,051 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-21 07:28:34,989 WARNING [optim.py:496] (3/4) Scaling gradients by 0.040334705263376236, model_norm_threshold=50.09689712524414 2024-08-21 07:28:35,158 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.841e+05, grad_sumsq=1.841e+05, orig_rms_sq=1.000e+00 2024-08-21 07:28:43,279 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 28 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-21 07:28:50,995 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 10250, loss[loss=0.08922, beats_loss=0.01205, ecapa_loss=0.0001164, whisper_loss=0.07601, over 21691.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01048, ecapa_loss=0.0001376, whisper_loss=0.08966, over 3786349.61 frames. ], batch size: 86, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:29:18,030 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.15 vs. limit=22.5 2024-08-21 07:29:18,073 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.99 vs. limit=10.0 2024-08-21 07:29:48,091 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-21 07:29:56,025 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.288e+01 2.559e+01 2.960e+01 1.242e+03, threshold=5.118e+01, percent-clipped=2.0 2024-08-21 07:30:21,365 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.346e+00 2024-08-21 07:30:28,606 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 10300, loss[loss=0.1049, beats_loss=0.008229, ecapa_loss=0.0001772, whisper_loss=0.09488, over 20577.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01037, ecapa_loss=0.0001389, whisper_loss=0.0902, over 3805176.83 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:30:29,956 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.15 vs. limit=10.0 2024-08-21 07:30:54,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=5140850.0, ans=0.07 2024-08-21 07:31:17,833 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 20 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-21 07:31:34,416 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 33 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-21 07:31:44,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=5141050.0, ans=0.125 2024-08-21 07:31:51,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=5141050.0, ans=0.125 2024-08-21 07:32:21,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=5141150.0, ans=0.125 2024-08-21 07:32:24,716 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 10350, loss[loss=0.08494, beats_loss=0.009911, ecapa_loss=0.0001781, whisper_loss=0.07324, over 15686.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01049, ecapa_loss=0.0001394, whisper_loss=0.08961, over 3813316.54 frames. ], batch size: 67, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:32:41,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5141250.0, ans=0.125 2024-08-21 07:32:44,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=5141350.0, ans=0.02 2024-08-21 07:32:47,023 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 16 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-21 07:32:56,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=5141350.0, ans=0.125 2024-08-21 07:32:57,585 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 18 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-21 07:32:57,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=5141350.0, ans=0.015 2024-08-21 07:32:57,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=5141350.0, ans=0.125 2024-08-21 07:33:32,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=5141550.0, ans=0.125 2024-08-21 07:33:35,177 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.267e+01 2.630e+01 2.969e+01 5.000e+01, threshold=5.261e+01, percent-clipped=0.0 2024-08-21 07:33:39,345 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 17 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-21 07:33:40,750 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.40 vs. limit=15.0 2024-08-21 07:33:52,405 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.46 vs. limit=15.0 2024-08-21 07:33:55,967 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 22 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-21 07:34:08,030 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 10400, loss[loss=0.0904, beats_loss=0.01495, ecapa_loss=0.000135, whisper_loss=0.0741, over 15887.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01041, ecapa_loss=0.0001396, whisper_loss=0.08955, over 3770963.73 frames. ], batch size: 66, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:34:11,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5141750.0, ans=0.125 2024-08-21 07:34:42,500 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 07:35:17,293 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-21 07:35:54,748 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 10450, loss[loss=0.07784, beats_loss=0.01087, ecapa_loss=0.0001281, whisper_loss=0.06569, over 20489.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01038, ecapa_loss=0.0001393, whisper_loss=0.09025, over 3784938.65 frames. ], batch size: 80, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:36:02,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=5142250.0, ans=0.0 2024-08-21 07:36:20,793 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-08-21 07:36:53,843 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-21 07:37:18,065 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.371e+01 2.736e+01 3.043e+01 5.041e+02, threshold=5.472e+01, percent-clipped=3.0 2024-08-21 07:37:19,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=5142550.0, ans=0.0 2024-08-21 07:37:24,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5142550.0, ans=0.125 2024-08-21 07:37:36,414 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 21 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-21 07:37:40,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5142650.0, ans=0.125 2024-08-21 07:37:40,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=5142650.0, ans=0.125 2024-08-21 07:37:50,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=5142650.0, ans=0.2 2024-08-21 07:37:53,572 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 10500, loss[loss=0.1035, beats_loss=0.01021, ecapa_loss=0.00017, whisper_loss=0.09158, over 21537.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01034, ecapa_loss=0.0001396, whisper_loss=0.09053, over 3784129.57 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:37:57,105 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 25 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-21 07:37:57,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=5142750.0, ans=0.015 2024-08-21 07:38:10,028 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.08 vs. limit=22.5 2024-08-21 07:38:10,785 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 18 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-21 07:38:16,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=5142850.0, ans=0.2 2024-08-21 07:38:31,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5142850.0, ans=0.1 2024-08-21 07:38:54,622 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 25 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-21 07:39:00,661 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 26 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-21 07:39:21,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5143150.0, ans=0.1 2024-08-21 07:39:24,874 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 35 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-21 07:39:37,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5143150.0, ans=0.125 2024-08-21 07:39:41,922 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 10550, loss[loss=0.1073, beats_loss=0.0113, ecapa_loss=0.0001351, whisper_loss=0.09467, over 22061.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01041, ecapa_loss=0.0001398, whisper_loss=0.08966, over 3768848.05 frames. ], batch size: 87, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:40:00,463 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.30 vs. limit=22.5 2024-08-21 07:40:06,836 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 18 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-21 07:40:23,168 INFO [train_multi_KD3.py:845] (3/4) A total of 61 cuts. 14 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-21 07:40:24,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=5143450.0, ans=0.2 2024-08-21 07:40:49,364 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 30 from LS+wenet, 8 from Vox, 36 fro AS 2024-08-21 07:40:50,836 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.310e+01 2.474e+01 2.751e+01 3.009e+02, threshold=4.947e+01, percent-clipped=3.0 2024-08-21 07:40:53,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5143550.0, ans=0.125 2024-08-21 07:40:59,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5143550.0, ans=0.125 2024-08-21 07:41:00,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5143650.0, ans=0.0 2024-08-21 07:41:18,623 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.122e+05 2024-08-21 07:41:20,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=5143750.0, ans=0.0 2024-08-21 07:41:20,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=5143750.0, ans=15.0 2024-08-21 07:41:21,522 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 10600, loss[loss=0.11, beats_loss=0.008805, ecapa_loss=0.0001448, whisper_loss=0.09978, over 15602.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01046, ecapa_loss=0.0001382, whisper_loss=0.08988, over 3760702.14 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:41:35,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5143750.0, ans=0.0 2024-08-21 07:41:52,397 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.49 vs. limit=12.0 2024-08-21 07:41:54,327 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.83 vs. limit=15.0 2024-08-21 07:41:59,650 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.84 vs. limit=22.5 2024-08-21 07:42:34,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=5144150.0, ans=0.125 2024-08-21 07:42:38,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=5144150.0, ans=0.025 2024-08-21 07:42:53,206 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 22 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-21 07:42:55,662 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 10650, loss[loss=0.09909, beats_loss=0.01206, ecapa_loss=9.965e-05, whisper_loss=0.08603, over 17896.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01046, ecapa_loss=0.0001381, whisper_loss=0.08962, over 3759075.86 frames. ], batch size: 71, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:43:10,006 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 20 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-21 07:43:20,742 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.07 vs. limit=15.0 2024-08-21 07:43:21,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=5144350.0, ans=0.07 2024-08-21 07:43:24,059 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 19 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-21 07:43:45,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=5144450.0, ans=0.125 2024-08-21 07:43:56,514 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 17 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-21 07:44:04,173 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.276e+01 2.540e+01 2.903e+01 1.576e+02, threshold=5.081e+01, percent-clipped=1.0 2024-08-21 07:44:17,497 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 21 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-21 07:44:33,455 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 10700, loss[loss=0.08612, beats_loss=0.01306, ecapa_loss=9.902e-05, whisper_loss=0.07206, over 19665.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01046, ecapa_loss=0.0001389, whisper_loss=0.08965, over 3800027.35 frames. ], batch size: 79, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:44:49,059 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 17 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-21 07:44:57,040 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 14 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-21 07:44:58,282 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.01 vs. limit=12.0 2024-08-21 07:44:59,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5144850.0, ans=0.1 2024-08-21 07:45:03,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=5144850.0, ans=0.0 2024-08-21 07:45:15,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5144950.0, ans=0.125 2024-08-21 07:45:40,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=5145050.0, ans=0.125 2024-08-21 07:45:43,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=5145050.0, ans=0.125 2024-08-21 07:46:10,563 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 10750, loss[loss=0.1209, beats_loss=0.009077, ecapa_loss=0.0001475, whisper_loss=0.1104, over 21718.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01048, ecapa_loss=0.0001384, whisper_loss=0.09013, over 3830018.49 frames. ], batch size: 82, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:46:41,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5145350.0, ans=0.125 2024-08-21 07:46:53,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5145450.0, ans=0.125 2024-08-21 07:46:54,889 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 18 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-21 07:47:06,405 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 19 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-21 07:47:16,684 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.270e+01 2.527e+01 2.757e+01 4.165e+01, threshold=5.054e+01, percent-clipped=0.0 2024-08-21 07:47:35,676 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.65 vs. limit=22.5 2024-08-21 07:47:47,532 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 10800, loss[loss=0.05606, beats_loss=0.01145, ecapa_loss=0.0001356, whisper_loss=0.04325, over 13345.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01048, ecapa_loss=0.0001385, whisper_loss=0.08949, over 3862936.63 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:48:12,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=5145850.0, ans=0.015 2024-08-21 07:48:42,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5146050.0, ans=0.1 2024-08-21 07:48:49,851 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 16 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-21 07:48:55,164 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-21 07:49:15,550 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 34 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-21 07:49:20,940 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 10850, loss[loss=0.1034, beats_loss=0.01058, ecapa_loss=0.0001508, whisper_loss=0.09136, over 20972.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01039, ecapa_loss=0.0001392, whisper_loss=0.08941, over 3854819.93 frames. ], batch size: 89, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:49:33,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5146250.0, ans=0.0 2024-08-21 07:49:40,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5146350.0, ans=0.0 2024-08-21 07:50:14,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=5146450.0, ans=0.0 2024-08-21 07:50:18,554 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 25 from LS+wenet, 8 from Vox, 27 fro AS 2024-08-21 07:50:23,459 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.344e+01 2.543e+01 2.878e+01 8.431e+01, threshold=5.085e+01, percent-clipped=1.0 2024-08-21 07:50:34,453 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 07:50:38,489 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 17 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-21 07:50:42,301 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 21 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-21 07:50:52,687 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 10900, loss[loss=0.1179, beats_loss=0.01009, ecapa_loss=0.0001067, whisper_loss=0.1068, over 20001.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01031, ecapa_loss=0.0001394, whisper_loss=0.09015, over 3816097.63 frames. ], batch size: 78, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:51:04,044 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 27 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-21 07:51:21,474 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.24 vs. limit=15.0 2024-08-21 07:51:38,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5146950.0, ans=0.1 2024-08-21 07:51:39,285 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2024-08-21 07:52:22,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5147250.0, ans=0.1 2024-08-21 07:52:23,183 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 10950, loss[loss=0.1237, beats_loss=0.007125, ecapa_loss=0.0001648, whisper_loss=0.1149, over 18489.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0103, ecapa_loss=0.0001394, whisper_loss=0.09082, over 3809162.45 frames. ], batch size: 71, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:52:55,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=5147350.0, ans=0.0 2024-08-21 07:53:05,652 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-21 07:53:22,473 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.337e+01 2.519e+01 2.828e+01 1.066e+02, threshold=5.038e+01, percent-clipped=2.0 2024-08-21 07:53:26,465 INFO [train_multi_KD3.py:845] (3/4) A total of 78 cuts. 17 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-21 07:53:47,075 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 19 from LS+wenet, 13 from Vox, 56 fro AS 2024-08-21 07:53:50,493 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.80 vs. limit=5.0 2024-08-21 07:53:52,847 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 11000, loss[loss=0.1228, beats_loss=0.009384, ecapa_loss=0.0001529, whisper_loss=0.1119, over 21852.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01034, ecapa_loss=0.0001397, whisper_loss=0.09023, over 3846237.34 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:54:19,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5147850.0, ans=0.1 2024-08-21 07:54:20,020 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 24 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-21 07:54:33,100 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.91 vs. limit=15.0 2024-08-21 07:54:34,994 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.72 vs. limit=6.0 2024-08-21 07:54:36,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5147950.0, ans=0.1 2024-08-21 07:54:40,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5147950.0, ans=0.1 2024-08-21 07:55:06,527 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.94 vs. limit=22.5 2024-08-21 07:55:11,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=5148150.0, ans=15.0 2024-08-21 07:55:21,765 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 11050, loss[loss=0.1061, beats_loss=0.007942, ecapa_loss=0.0001384, whisper_loss=0.09679, over 14086.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01036, ecapa_loss=0.0001395, whisper_loss=0.09032, over 3870199.60 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:55:27,968 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.42 vs. limit=6.0 2024-08-21 07:55:28,364 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 18 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-21 07:55:47,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=5148350.0, ans=0.2 2024-08-21 07:55:55,528 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 24 from LS+wenet, 17 from Vox, 51 fro AS 2024-08-21 07:56:08,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=5148450.0, ans=0.125 2024-08-21 07:56:19,086 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.796e+01 2.275e+01 2.484e+01 2.790e+01 7.658e+01, threshold=4.968e+01, percent-clipped=1.0 2024-08-21 07:56:23,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=5148550.0, ans=0.0 2024-08-21 07:56:26,913 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 20 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-21 07:56:43,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5148650.0, ans=0.125 2024-08-21 07:56:48,757 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 11100, loss[loss=0.0828, beats_loss=0.01112, ecapa_loss=0.0001527, whisper_loss=0.07016, over 20283.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01041, ecapa_loss=0.0001386, whisper_loss=0.09057, over 3890805.66 frames. ], batch size: 86, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:56:52,575 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.19 vs. limit=15.0 2024-08-21 07:56:53,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5148750.0, ans=0.125 2024-08-21 07:56:56,131 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 31 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-21 07:57:02,723 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.85 vs. limit=22.5 2024-08-21 07:57:04,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5148750.0, ans=0.0 2024-08-21 07:57:15,003 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 15 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-21 07:57:34,742 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.37 vs. limit=15.0 2024-08-21 07:57:45,516 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 25 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-21 07:57:47,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=5149050.0, ans=0.2 2024-08-21 07:57:55,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5149050.0, ans=0.125 2024-08-21 07:58:00,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5149150.0, ans=0.125 2024-08-21 07:58:13,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5149150.0, ans=0.1 2024-08-21 07:58:18,681 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 11150, loss[loss=0.09735, beats_loss=0.009429, ecapa_loss=0.000154, whisper_loss=0.08638, over 21412.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0104, ecapa_loss=0.0001394, whisper_loss=0.0907, over 3909611.77 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:58:34,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5149350.0, ans=0.125 2024-08-21 07:58:42,217 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 36 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-21 07:58:46,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5149350.0, ans=0.1 2024-08-21 07:58:47,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=5149350.0, ans=0.015 2024-08-21 07:59:02,565 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 25 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-21 07:59:11,713 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.152e-01 2024-08-21 07:59:15,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5149550.0, ans=0.0 2024-08-21 07:59:17,845 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.724e+01 2.340e+01 2.550e+01 2.884e+01 1.372e+02, threshold=5.100e+01, percent-clipped=2.0 2024-08-21 07:59:20,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=5149550.0, ans=0.07 2024-08-21 07:59:21,668 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 28 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-21 07:59:26,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5149550.0, ans=0.125 2024-08-21 07:59:46,177 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 11200, loss[loss=0.1176, beats_loss=0.01033, ecapa_loss=0.0001701, whisper_loss=0.1056, over 20983.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01034, ecapa_loss=0.0001396, whisper_loss=0.09141, over 3918875.55 frames. ], batch size: 88, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:59:48,336 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 17 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-21 08:00:09,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=5149850.0, ans=0.2 2024-08-21 08:00:15,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5149850.0, ans=0.125 2024-08-21 08:00:43,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5149950.0, ans=0.125 2024-08-21 08:00:56,226 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=6.0 2024-08-21 08:01:00,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5150050.0, ans=0.125 2024-08-21 08:01:00,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=5150050.0, ans=0.125 2024-08-21 08:01:09,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=5150150.0, ans=0.2 2024-08-21 08:01:23,505 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 26 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-21 08:01:29,908 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 11250, loss[loss=0.08035, beats_loss=0.009072, ecapa_loss=0.0001413, whisper_loss=0.06986, over 17072.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01032, ecapa_loss=0.0001388, whisper_loss=0.09116, over 3903050.15 frames. ], batch size: 67, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:01:48,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=5150250.0, ans=0.0 2024-08-21 08:01:51,404 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 16 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-21 08:02:33,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5150550.0, ans=0.0 2024-08-21 08:02:35,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=5150550.0, ans=0.0 2024-08-21 08:02:39,642 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.829e+00 2024-08-21 08:02:42,095 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.342e+01 2.612e+01 2.998e+01 2.607e+02, threshold=5.224e+01, percent-clipped=1.0 2024-08-21 08:02:56,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5150650.0, ans=0.125 2024-08-21 08:03:09,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=5150650.0, ans=0.2 2024-08-21 08:03:15,127 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 11300, loss[loss=0.1216, beats_loss=0.00829, ecapa_loss=0.0001385, whisper_loss=0.1119, over 23360.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01037, ecapa_loss=0.0001392, whisper_loss=0.09068, over 3890515.08 frames. ], batch size: 94, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:03:38,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=5150850.0, ans=0.2 2024-08-21 08:04:32,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5151050.0, ans=0.0 2024-08-21 08:04:38,866 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 24 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-21 08:04:54,571 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 11350, loss[loss=0.103, beats_loss=0.01049, ecapa_loss=0.0001333, whisper_loss=0.09114, over 20101.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01035, ecapa_loss=0.000139, whisper_loss=0.0904, over 3854628.85 frames. ], batch size: 79, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:05:15,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5151350.0, ans=0.0 2024-08-21 08:05:15,437 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2024-08-21 08:05:18,585 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-21 08:05:25,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=5151350.0, ans=0.125 2024-08-21 08:05:32,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=5151450.0, ans=0.2 2024-08-21 08:05:32,801 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.84 vs. limit=15.0 2024-08-21 08:05:47,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5151450.0, ans=0.125 2024-08-21 08:05:57,447 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.719e+01 2.232e+01 2.527e+01 2.803e+01 3.759e+01, threshold=5.053e+01, percent-clipped=0.0 2024-08-21 08:06:10,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=5151650.0, ans=0.0 2024-08-21 08:06:11,397 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.75 vs. limit=15.0 2024-08-21 08:06:13,847 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 18 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-21 08:06:18,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5151650.0, ans=0.125 2024-08-21 08:06:20,990 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.33 vs. limit=22.5 2024-08-21 08:06:23,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5151650.0, ans=0.1 2024-08-21 08:06:26,844 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 11400, loss[loss=0.08761, beats_loss=0.01041, ecapa_loss=0.0001589, whisper_loss=0.07561, over 14440.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01029, ecapa_loss=0.0001392, whisper_loss=0.09041, over 3827524.05 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:06:43,851 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 08:06:45,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=5151850.0, ans=0.125 2024-08-21 08:06:49,228 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 29 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-21 08:06:58,724 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 08:07:00,717 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.52 vs. limit=10.0 2024-08-21 08:07:33,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5152050.0, ans=0.1 2024-08-21 08:08:06,650 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 11450, loss[loss=0.09613, beats_loss=0.01268, ecapa_loss=0.0001337, whisper_loss=0.08211, over 22260.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01034, ecapa_loss=0.0001389, whisper_loss=0.09091, over 3856782.05 frames. ], batch size: 93, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:08:07,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5152250.0, ans=0.125 2024-08-21 08:08:07,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5152250.0, ans=0.125 2024-08-21 08:08:09,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=5152250.0, ans=0.2 2024-08-21 08:08:40,521 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.41 vs. limit=15.0 2024-08-21 08:09:08,362 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 20 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-21 08:09:13,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=5152550.0, ans=0.0 2024-08-21 08:09:14,597 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.353e+01 2.552e+01 2.800e+01 3.552e+01, threshold=5.103e+01, percent-clipped=0.0 2024-08-21 08:09:14,827 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 18 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-21 08:09:26,997 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 32 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-21 08:09:31,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=5152650.0, ans=0.125 2024-08-21 08:09:34,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=5152650.0, ans=0.125 2024-08-21 08:09:46,858 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 11500, loss[loss=0.08473, beats_loss=0.01141, ecapa_loss=0.0001366, whisper_loss=0.07195, over 21748.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01028, ecapa_loss=0.000139, whisper_loss=0.09098, over 3826751.30 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:10:12,174 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.76 vs. limit=10.0 2024-08-21 08:10:13,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=5152850.0, ans=0.0 2024-08-21 08:10:16,551 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 20 from LS+wenet, 25 from Vox, 48 fro AS 2024-08-21 08:10:21,989 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 36 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-21 08:10:25,773 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 22 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-21 08:10:27,947 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 29 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-21 08:11:00,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=5153050.0, ans=0.2 2024-08-21 08:11:14,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=5153150.0, ans=0.125 2024-08-21 08:11:17,959 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 25 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-21 08:11:23,285 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 11550, loss[loss=0.1076, beats_loss=0.007802, ecapa_loss=0.0001027, whisper_loss=0.09878, over 14339.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01026, ecapa_loss=0.0001388, whisper_loss=0.09115, over 3794074.10 frames. ], batch size: 50, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:11:33,306 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.48 vs. limit=22.5 2024-08-21 08:11:43,278 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2024-08-21 08:11:46,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5153350.0, ans=0.125 2024-08-21 08:11:49,285 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 19 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-21 08:12:07,714 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 19 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-21 08:12:11,213 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 29 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-21 08:12:13,312 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-21 08:12:27,598 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.384e+01 2.690e+01 2.968e+01 5.018e+01, threshold=5.380e+01, percent-clipped=0.0 2024-08-21 08:12:29,836 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-21 08:12:54,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=5153650.0, ans=0.0 2024-08-21 08:12:57,141 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 11600, loss[loss=0.08561, beats_loss=0.01037, ecapa_loss=0.0001852, whisper_loss=0.07339, over 13560.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01032, ecapa_loss=0.0001377, whisper_loss=0.09072, over 3785967.16 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 1.152921504606847e+18 2024-08-21 08:13:02,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5153750.0, ans=0.125 2024-08-21 08:13:10,053 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 33 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-21 08:13:17,995 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.58 vs. limit=15.0 2024-08-21 08:13:29,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=5153850.0, ans=0.0 2024-08-21 08:13:38,200 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 20 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-21 08:13:58,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=5154050.0, ans=0.0 2024-08-21 08:14:28,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=5154150.0, ans=0.2 2024-08-21 08:14:29,279 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 22 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-21 08:14:33,750 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 15 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-21 08:14:35,499 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 11650, loss[loss=0.08869, beats_loss=0.01047, ecapa_loss=9.846e-05, whisper_loss=0.07724, over 14779.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01032, ecapa_loss=0.0001376, whisper_loss=0.09079, over 3770484.84 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 1.152921504606847e+18 2024-08-21 08:14:40,901 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.077e-01 2024-08-21 08:14:43,529 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=15.0 2024-08-21 08:15:04,662 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 08:15:11,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=5154350.0, ans=0.2 2024-08-21 08:15:28,254 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.93 vs. limit=6.0 2024-08-21 08:15:40,078 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.291e+01 2.549e+01 2.927e+01 7.915e+01, threshold=5.097e+01, percent-clipped=1.0 2024-08-21 08:16:07,815 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 21 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-21 08:16:11,412 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 11700, loss[loss=0.09516, beats_loss=0.01266, ecapa_loss=0.0001057, whisper_loss=0.08144, over 22852.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0104, ecapa_loss=0.000138, whisper_loss=0.08962, over 3787305.00 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:16:14,959 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 31 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-21 08:16:29,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5154850.0, ans=0.0 2024-08-21 08:16:32,613 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 24 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-21 08:16:36,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5154850.0, ans=0.125 2024-08-21 08:16:50,315 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 29 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-21 08:16:52,199 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 24 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-21 08:16:52,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5154950.0, ans=0.0 2024-08-21 08:16:57,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=5154950.0, ans=0.125 2024-08-21 08:17:01,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5154950.0, ans=0.125 2024-08-21 08:17:08,175 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 34 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-21 08:17:12,378 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.89 vs. limit=15.0 2024-08-21 08:17:19,026 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 17 from LS+wenet, 32 from Vox, 35 fro AS 2024-08-21 08:17:41,260 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 11750, loss[loss=0.08483, beats_loss=0.01119, ecapa_loss=0.0001177, whisper_loss=0.07246, over 18682.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01037, ecapa_loss=0.0001381, whisper_loss=0.09056, over 3839032.58 frames. ], batch size: 74, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:17:43,442 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-21 08:17:45,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=5155250.0, ans=0.2 2024-08-21 08:17:48,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5155250.0, ans=0.0 2024-08-21 08:17:49,292 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2024-08-21 08:17:55,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=5155250.0, ans=0.0 2024-08-21 08:18:09,309 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 22 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-21 08:18:14,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=5155450.0, ans=0.2 2024-08-21 08:18:16,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=5155450.0, ans=0.025 2024-08-21 08:18:19,383 INFO [train_multi_KD3.py:845] (3/4) A total of 55 cuts. 10 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-21 08:18:27,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=5155450.0, ans=0.0 2024-08-21 08:18:32,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5155550.0, ans=0.1 2024-08-21 08:18:36,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5155550.0, ans=0.0 2024-08-21 08:18:41,212 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.496e+01 2.849e+01 3.219e+01 3.241e+02, threshold=5.697e+01, percent-clipped=3.0 2024-08-21 08:18:43,768 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 25 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-21 08:18:57,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5155650.0, ans=0.125 2024-08-21 08:19:07,368 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 11800, loss[loss=0.1121, beats_loss=0.0101, ecapa_loss=0.0001222, whisper_loss=0.1008, over 16631.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01037, ecapa_loss=0.0001388, whisper_loss=0.09032, over 3813928.11 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:19:23,436 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2024-08-21 08:19:26,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5155850.0, ans=0.125 2024-08-21 08:19:41,494 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.82 vs. limit=15.0 2024-08-21 08:19:53,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5155950.0, ans=0.0 2024-08-21 08:20:46,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=5156150.0, ans=0.0 2024-08-21 08:20:57,604 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 11850, loss[loss=0.108, beats_loss=0.01118, ecapa_loss=0.0001515, whisper_loss=0.09535, over 21934.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01038, ecapa_loss=0.0001379, whisper_loss=0.09045, over 3876204.54 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:21:49,934 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-21 08:21:51,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=5156450.0, ans=0.2 2024-08-21 08:22:06,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=5156550.0, ans=15.0 2024-08-21 08:22:16,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5156550.0, ans=0.0 2024-08-21 08:22:17,683 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.264e+01 2.462e+01 2.768e+01 4.199e+02, threshold=4.924e+01, percent-clipped=1.0 2024-08-21 08:22:18,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=5156550.0, ans=0.125 2024-08-21 08:22:25,572 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-21 08:22:35,367 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 15 from LS+wenet, 22 from Vox, 15 fro AS 2024-08-21 08:22:41,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=5156650.0, ans=0.125 2024-08-21 08:22:48,156 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 11900, loss[loss=0.1113, beats_loss=0.01014, ecapa_loss=0.0001174, whisper_loss=0.09998, over 17805.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01048, ecapa_loss=0.0001377, whisper_loss=0.09014, over 3891496.57 frames. ], batch size: 67, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:22:53,929 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 30 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-21 08:23:26,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=5156850.0, ans=0.125 2024-08-21 08:23:38,025 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 17 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-21 08:23:40,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=5156950.0, ans=0.125 2024-08-21 08:24:13,043 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-21 08:24:14,297 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.34 vs. limit=15.0 2024-08-21 08:24:19,174 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 25 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-21 08:24:30,329 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 19 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-21 08:24:32,314 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 11950, loss[loss=0.1011, beats_loss=0.008529, ecapa_loss=0.0001463, whisper_loss=0.09107, over 14758.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01045, ecapa_loss=0.0001385, whisper_loss=0.09018, over 3893384.69 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:24:55,346 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 23 from LS+wenet, 19 from Vox, 16 fro AS 2024-08-21 08:25:13,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=5157350.0, ans=0.125 2024-08-21 08:25:50,799 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.272e+01 2.506e+01 2.845e+01 4.517e+01, threshold=5.012e+01, percent-clipped=0.0 2024-08-21 08:26:02,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=5157650.0, ans=0.0 2024-08-21 08:26:05,779 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-21 08:26:27,515 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 12000, loss[loss=0.1151, beats_loss=0.01297, ecapa_loss=0.0001038, whisper_loss=0.1011, over 24207.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01053, ecapa_loss=0.000138, whisper_loss=0.08978, over 3894266.45 frames. ], batch size: 93, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:26:27,516 INFO [train_multi_KD3.py:1140] (3/4) Computing validation loss 2024-08-21 08:27:05,335 INFO [train_multi_KD3.py:1150] (3/4) Epoch 35, validation on ASR_libri: loss=0.2549, beats_loss=0, ecapa_loss=0.0005016, whisper_loss=0.2499, over 931116.00 frames. 2024-08-21 08:27:31,499 INFO [train_multi_KD3.py:1150] (3/4) Epoch 35, validation on SV_voxceleb1: loss=0.00396, beats_loss=0, ecapa_loss=0.000396, whisper_loss=0, over 944235.00 frames. 2024-08-21 08:29:17,123 INFO [train_multi_KD3.py:1150] (3/4) Epoch 35, validation on AT_audioset: loss=0.02299, beats_loss=0.02299, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-21 08:29:17,127 INFO [train_multi_KD3.py:1156] (3/4) Maximum memory allocated so far is 32606MB 2024-08-21 08:29:20,181 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 24 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-21 08:29:35,111 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 17 from LS+wenet, 25 from Vox, 48 fro AS 2024-08-21 08:29:37,726 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.79 vs. limit=15.0 2024-08-21 08:29:41,975 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-21 08:30:16,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=5158050.0, ans=10.0 2024-08-21 08:30:37,502 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.32 vs. limit=15.0 2024-08-21 08:30:40,479 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 21 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-21 08:30:42,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=5158150.0, ans=0.125 2024-08-21 08:30:46,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=5158150.0, ans=0.125 2024-08-21 08:30:49,450 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 12050, loss[loss=0.1044, beats_loss=0.007858, ecapa_loss=0.0001899, whisper_loss=0.09465, over 17063.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01055, ecapa_loss=0.0001382, whisper_loss=0.08936, over 3909619.11 frames. ], batch size: 73, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:31:20,109 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-21 08:31:32,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=5158450.0, ans=10.0 2024-08-21 08:31:46,564 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 08:31:51,542 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 19 from LS+wenet, 20 from Vox, 14 fro AS 2024-08-21 08:31:56,544 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.247e+01 2.409e+01 2.694e+01 3.930e+01, threshold=4.818e+01, percent-clipped=0.0 2024-08-21 08:32:03,687 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 20 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-21 08:32:13,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5158650.0, ans=0.125 2024-08-21 08:32:21,774 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 21 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-21 08:32:31,276 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 12100, loss[loss=0.1184, beats_loss=0.01034, ecapa_loss=0.0001379, whisper_loss=0.1067, over 22791.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01056, ecapa_loss=0.0001374, whisper_loss=0.08947, over 3879544.43 frames. ], batch size: 89, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:32:52,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=5158750.0, ans=0.125 2024-08-21 08:33:03,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5158850.0, ans=0.1 2024-08-21 08:33:16,979 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 34 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-21 08:33:18,778 INFO [train_multi_KD3.py:845] (3/4) A total of 50 cuts. 17 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-21 08:33:58,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=5159150.0, ans=15.0 2024-08-21 08:33:59,152 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-21 08:34:04,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5159150.0, ans=0.1 2024-08-21 08:34:05,662 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 30 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-21 08:34:11,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5159150.0, ans=0.125 2024-08-21 08:34:22,662 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 28 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-21 08:34:24,136 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 12150, loss[loss=0.1033, beats_loss=0.008672, ecapa_loss=0.0001679, whisper_loss=0.09292, over 21701.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01048, ecapa_loss=0.0001377, whisper_loss=0.09029, over 3876577.57 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:34:31,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=5159250.0, ans=0.125 2024-08-21 08:34:34,241 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.42 vs. limit=10.0 2024-08-21 08:34:39,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=5159250.0, ans=0.2 2024-08-21 08:34:42,197 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.69 vs. limit=15.0 2024-08-21 08:34:55,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5159350.0, ans=0.1 2024-08-21 08:35:15,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5159450.0, ans=0.125 2024-08-21 08:35:17,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=5159450.0, ans=0.2 2024-08-21 08:35:28,307 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.279e+01 2.549e+01 2.837e+01 4.060e+01, threshold=5.098e+01, percent-clipped=0.0 2024-08-21 08:35:39,230 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 26 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-21 08:35:54,434 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 12200, loss[loss=0.1078, beats_loss=0.00947, ecapa_loss=0.0001381, whisper_loss=0.09693, over 22310.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01048, ecapa_loss=0.0001372, whisper_loss=0.09002, over 3855959.74 frames. ], batch size: 88, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:35:57,464 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 23 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-21 08:36:00,069 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=8.808e+00 2024-08-21 08:36:29,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5159950.0, ans=0.125 2024-08-21 08:36:44,470 WARNING [optim.py:496] (3/4) Scaling gradients by 0.09456772357225418, model_norm_threshold=50.97699737548828 2024-08-21 08:36:44,638 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.739e+04, grad_sumsq=4.739e+04, orig_rms_sq=1.000e+00 2024-08-21 08:36:47,958 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-21 08:36:50,630 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.02 vs. limit=22.5 2024-08-21 08:37:15,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5160150.0, ans=0.0 2024-08-21 08:37:23,097 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 12250, loss[loss=0.115, beats_loss=0.00874, ecapa_loss=9.1e-05, whisper_loss=0.1053, over 19070.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01056, ecapa_loss=0.0001375, whisper_loss=0.0899, over 3862877.59 frames. ], batch size: 70, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:37:24,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5160250.0, ans=0.1 2024-08-21 08:37:26,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=5160250.0, ans=0.0 2024-08-21 08:37:26,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=5160250.0, ans=0.0 2024-08-21 08:37:56,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=5160450.0, ans=0.05 2024-08-21 08:38:16,876 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.62 vs. limit=6.0 2024-08-21 08:38:25,492 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.259e+01 2.495e+01 2.764e+01 5.391e+02, threshold=4.989e+01, percent-clipped=1.0 2024-08-21 08:38:46,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=5160650.0, ans=0.2 2024-08-21 08:38:52,667 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 12300, loss[loss=0.1082, beats_loss=0.00829, ecapa_loss=0.0001184, whisper_loss=0.09877, over 14342.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01051, ecapa_loss=0.0001373, whisper_loss=0.08959, over 3823781.20 frames. ], batch size: 50, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:39:28,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5160950.0, ans=0.0 2024-08-21 08:39:57,852 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 33 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-21 08:40:20,133 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-21 08:40:24,851 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.21 vs. limit=22.5 2024-08-21 08:40:26,983 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 12350, loss[loss=0.08916, beats_loss=0.01005, ecapa_loss=0.0001339, whisper_loss=0.07777, over 16647.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01048, ecapa_loss=0.0001374, whisper_loss=0.08999, over 3862507.61 frames. ], batch size: 67, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:40:27,216 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 18 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-21 08:40:36,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5161250.0, ans=0.0 2024-08-21 08:40:54,043 INFO [train_multi_KD3.py:845] (3/4) A total of 85 cuts. 23 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-21 08:40:55,819 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 27 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-21 08:41:01,223 INFO [train_multi_KD3.py:845] (3/4) A total of 51 cuts. 18 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-21 08:41:01,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5161450.0, ans=0.0 2024-08-21 08:41:23,191 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 17 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-21 08:41:25,333 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 35 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-21 08:41:30,222 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.295e+01 2.536e+01 2.877e+01 1.914e+02, threshold=5.073e+01, percent-clipped=2.0 2024-08-21 08:41:57,743 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 12400, loss[loss=0.1192, beats_loss=0.006739, ecapa_loss=0.0001708, whisper_loss=0.1108, over 17555.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01044, ecapa_loss=0.000137, whisper_loss=0.09017, over 3809360.96 frames. ], batch size: 71, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:42:08,871 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.82 vs. limit=22.5 2024-08-21 08:42:57,932 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 21 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-21 08:43:00,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=5162050.0, ans=0.015 2024-08-21 08:43:02,099 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 08:43:02,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=5162050.0, ans=0.0 2024-08-21 08:43:16,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5162050.0, ans=0.125 2024-08-21 08:43:33,260 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-21 08:43:44,341 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 12450, loss[loss=0.09668, beats_loss=0.008853, ecapa_loss=0.0001353, whisper_loss=0.08647, over 14360.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01045, ecapa_loss=0.0001375, whisper_loss=0.08951, over 3812651.77 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:44:01,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5162250.0, ans=0.0 2024-08-21 08:44:14,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=5162350.0, ans=0.0 2024-08-21 08:44:50,219 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 25 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-21 08:44:56,052 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.293e+01 2.503e+01 2.840e+01 4.657e+01, threshold=5.005e+01, percent-clipped=0.0 2024-08-21 08:45:17,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5162650.0, ans=0.125 2024-08-21 08:45:18,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5162650.0, ans=0.1 2024-08-21 08:45:25,292 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.68 vs. limit=15.0 2024-08-21 08:45:27,620 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 12500, loss[loss=0.1004, beats_loss=0.009832, ecapa_loss=0.0001911, whisper_loss=0.08869, over 20469.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01038, ecapa_loss=0.0001385, whisper_loss=0.09007, over 3810253.71 frames. ], batch size: 87, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:45:29,892 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.23 vs. limit=15.0 2024-08-21 08:45:33,684 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-21 08:45:36,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=5162750.0, ans=0.0 2024-08-21 08:45:46,216 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-21 08:46:04,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5162850.0, ans=0.0 2024-08-21 08:46:13,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=5162950.0, ans=0.125 2024-08-21 08:46:26,491 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 33 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-21 08:46:49,380 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 20 from LS+wenet, 28 from Vox, 45 fro AS 2024-08-21 08:46:50,425 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.71 vs. limit=15.0 2024-08-21 08:47:10,765 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 12550, loss[loss=0.1068, beats_loss=0.01163, ecapa_loss=0.0001173, whisper_loss=0.09402, over 22765.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01034, ecapa_loss=0.0001383, whisper_loss=0.09025, over 3823052.87 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:47:23,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=5163250.0, ans=0.95 2024-08-21 08:47:28,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5163250.0, ans=0.125 2024-08-21 08:47:33,684 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-21 08:47:37,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5163350.0, ans=0.125 2024-08-21 08:48:02,171 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 16 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-21 08:48:10,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5163450.0, ans=0.125 2024-08-21 08:48:21,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5163550.0, ans=0.125 2024-08-21 08:48:24,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=5163550.0, ans=0.0 2024-08-21 08:48:25,248 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 24 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-21 08:48:26,510 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.391e+01 2.623e+01 3.039e+01 4.282e+01, threshold=5.246e+01, percent-clipped=0.0 2024-08-21 08:48:28,792 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 17 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-21 08:48:33,214 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 24 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-21 08:48:45,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5163650.0, ans=0.125 2024-08-21 08:48:46,096 INFO [train_multi_KD3.py:845] (3/4) A total of 74 cuts. 22 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-21 08:48:56,023 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 12600, loss[loss=0.1168, beats_loss=0.01061, ecapa_loss=0.0001153, whisper_loss=0.105, over 23332.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01041, ecapa_loss=0.0001385, whisper_loss=0.08995, over 3807123.86 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:48:58,563 INFO [train_multi_KD3.py:845] (3/4) A total of 75 cuts. 23 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-21 08:49:09,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5163750.0, ans=0.125 2024-08-21 08:49:37,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5163950.0, ans=0.1 2024-08-21 08:49:44,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=5163950.0, ans=0.2 2024-08-21 08:49:49,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5164050.0, ans=0.1 2024-08-21 08:50:19,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=5164150.0, ans=0.2 2024-08-21 08:50:27,511 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 12650, loss[loss=0.08842, beats_loss=0.0137, ecapa_loss=0.0001187, whisper_loss=0.07353, over 18991.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01037, ecapa_loss=0.0001372, whisper_loss=0.09002, over 3817257.51 frames. ], batch size: 76, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:50:30,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5164250.0, ans=0.125 2024-08-21 08:50:33,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5164250.0, ans=0.125 2024-08-21 08:50:33,961 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.65 vs. limit=10.0 2024-08-21 08:50:55,019 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 33 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-21 08:51:07,938 INFO [train_multi_KD3.py:845] (3/4) A total of 82 cuts. 32 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-21 08:51:14,247 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.90 vs. limit=6.0 2024-08-21 08:51:30,811 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.313e+01 2.541e+01 2.786e+01 6.490e+01, threshold=5.082e+01, percent-clipped=1.0 2024-08-21 08:51:36,984 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.49 vs. limit=15.0 2024-08-21 08:51:38,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5164650.0, ans=0.125 2024-08-21 08:51:40,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=5164650.0, ans=0.125 2024-08-21 08:51:58,196 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 12700, loss[loss=0.1001, beats_loss=0.01257, ecapa_loss=0.0001274, whisper_loss=0.08628, over 21760.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01034, ecapa_loss=0.0001374, whisper_loss=0.0898, over 3836780.87 frames. ], batch size: 89, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:52:21,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5164850.0, ans=0.1 2024-08-21 08:52:21,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=5164850.0, ans=0.0 2024-08-21 08:52:41,509 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2024-08-21 08:53:09,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=5165050.0, ans=0.1 2024-08-21 08:53:13,521 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.88 vs. limit=15.0 2024-08-21 08:53:17,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5165050.0, ans=0.1 2024-08-21 08:53:20,117 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.94 vs. limit=15.0 2024-08-21 08:53:33,828 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0630233883857727, model_norm_threshold=50.820472717285156 2024-08-21 08:53:33,996 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.38, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.448e+05, grad_sumsq=2.272e+07, orig_rms_sq=1.077e-02 2024-08-21 08:53:46,024 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 27 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-21 08:53:48,956 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 12750, loss[loss=0.1084, beats_loss=0.009636, ecapa_loss=0.0001211, whisper_loss=0.09758, over 19941.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01034, ecapa_loss=0.0001387, whisper_loss=0.0893, over 3826632.31 frames. ], batch size: 76, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:54:01,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5165250.0, ans=0.125 2024-08-21 08:54:58,771 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.01 vs. limit=15.0 2024-08-21 08:55:07,809 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.308e+01 2.534e+01 2.848e+01 8.064e+02, threshold=5.067e+01, percent-clipped=1.0 2024-08-21 08:55:13,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5165550.0, ans=0.0 2024-08-21 08:55:20,534 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-08-21 08:55:28,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=5165650.0, ans=0.1 2024-08-21 08:55:43,317 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 12800, loss[loss=0.09522, beats_loss=0.01196, ecapa_loss=0.0001554, whisper_loss=0.08171, over 21921.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01032, ecapa_loss=0.0001387, whisper_loss=0.08987, over 3860988.48 frames. ], batch size: 92, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:55:46,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=5165750.0, ans=0.0 2024-08-21 08:55:53,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=5165750.0, ans=0.125 2024-08-21 08:57:02,369 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.03 vs. limit=10.0 2024-08-21 08:57:35,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=5166250.0, ans=0.125 2024-08-21 08:57:36,104 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 12850, loss[loss=0.09466, beats_loss=0.01258, ecapa_loss=0.0001409, whisper_loss=0.08068, over 16238.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01041, ecapa_loss=0.0001373, whisper_loss=0.08982, over 3874286.26 frames. ], batch size: 66, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:57:45,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5166250.0, ans=0.125 2024-08-21 08:57:48,791 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 34 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-21 08:58:03,909 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.358e+00 2024-08-21 08:58:31,396 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.38 vs. limit=15.0 2024-08-21 08:58:35,867 INFO [train_multi_KD3.py:845] (3/4) A total of 69 cuts. 24 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-21 08:59:02,491 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.663e+01 2.226e+01 2.436e+01 2.740e+01 3.525e+01, threshold=4.872e+01, percent-clipped=0.0 2024-08-21 08:59:02,686 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 16 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-21 08:59:04,994 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 27 from LS+wenet, 7 from Vox, 25 fro AS 2024-08-21 08:59:34,431 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-21 08:59:36,019 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 12900, loss[loss=0.09461, beats_loss=0.0122, ecapa_loss=0.000188, whisper_loss=0.08053, over 20910.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01046, ecapa_loss=0.0001376, whisper_loss=0.09009, over 3843534.32 frames. ], batch size: 92, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:59:43,179 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 29 from LS+wenet, 37 from Vox, 28 fro AS 2024-08-21 08:59:44,228 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.29 vs. limit=22.5 2024-08-21 09:00:18,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=5166850.0, ans=0.125 2024-08-21 09:00:35,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5166950.0, ans=0.1 2024-08-21 09:00:47,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=5167050.0, ans=0.0 2024-08-21 09:00:48,409 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 25 from LS+wenet, 32 from Vox, 35 fro AS 2024-08-21 09:00:59,147 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.47 vs. limit=15.0 2024-08-21 09:01:09,419 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 27 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-21 09:01:27,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5167150.0, ans=0.125 2024-08-21 09:01:41,510 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 12950, loss[loss=0.1016, beats_loss=0.0111, ecapa_loss=0.0001461, whisper_loss=0.08901, over 22347.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01045, ecapa_loss=0.0001384, whisper_loss=0.08973, over 3856248.37 frames. ], batch size: 91, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:01:41,698 INFO [train_multi_KD3.py:845] (3/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-21 09:01:44,249 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 20 from LS+wenet, 34 from Vox, 39 fro AS 2024-08-21 09:02:18,422 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-21 09:02:42,486 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-21 09:03:14,838 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.293e+01 2.527e+01 2.916e+01 2.821e+02, threshold=5.054e+01, percent-clipped=3.0 2024-08-21 09:03:53,540 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 13000, loss[loss=0.1004, beats_loss=0.01026, ecapa_loss=0.0001591, whisper_loss=0.08853, over 22232.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0105, ecapa_loss=0.0001383, whisper_loss=0.08947, over 3834129.59 frames. ], batch size: 92, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:03:53,741 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-21 09:04:45,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5167950.0, ans=0.0 2024-08-21 09:05:07,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=5168050.0, ans=0.2 2024-08-21 09:05:15,292 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.83 vs. limit=10.0 2024-08-21 09:05:23,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2024-08-21 09:05:47,085 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 13050, loss[loss=0.1334, beats_loss=0.009045, ecapa_loss=0.0001289, whisper_loss=0.1231, over 22777.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01059, ecapa_loss=0.0001381, whisper_loss=0.0895, over 3809451.22 frames. ], batch size: 89, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:06:53,841 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.244e+01 2.565e+01 2.822e+01 8.760e+01, threshold=5.130e+01, percent-clipped=2.0 2024-08-21 09:06:55,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=5168550.0, ans=0.2 2024-08-21 09:07:13,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5168650.0, ans=0.125 2024-08-21 09:07:22,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5168650.0, ans=0.125 2024-08-21 09:07:26,638 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 13100, loss[loss=0.09054, beats_loss=0.01303, ecapa_loss=0.0001069, whisper_loss=0.07644, over 21307.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01057, ecapa_loss=0.0001387, whisper_loss=0.0896, over 3841226.10 frames. ], batch size: 83, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:07:35,428 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.57 vs. limit=12.0 2024-08-21 09:07:42,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5168750.0, ans=0.125 2024-08-21 09:07:56,742 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.49 vs. limit=22.5 2024-08-21 09:08:47,745 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 17 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-21 09:08:50,423 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 12 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-21 09:09:18,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5169150.0, ans=0.125 2024-08-21 09:09:31,227 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 13150, loss[loss=0.1007, beats_loss=0.009676, ecapa_loss=0.0001499, whisper_loss=0.08948, over 19064.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01047, ecapa_loss=0.0001384, whisper_loss=0.08988, over 3822279.31 frames. ], batch size: 78, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:09:36,796 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0913950502872467, model_norm_threshold=51.30171203613281 2024-08-21 09:09:36,963 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.48, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.527e+05, grad_sumsq=4.631e+04, orig_rms_sq=3.298e+00 2024-08-21 09:09:39,542 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-21 09:10:03,781 INFO [train_multi_KD3.py:845] (3/4) A total of 52 cuts. 17 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-21 09:10:15,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5169350.0, ans=0.125 2024-08-21 09:10:59,217 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.541e+01 2.235e+01 2.478e+01 2.768e+01 5.613e+02, threshold=4.956e+01, percent-clipped=2.0 2024-08-21 09:11:06,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=5169550.0, ans=0.0 2024-08-21 09:11:37,589 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 13200, loss[loss=0.07108, beats_loss=0.01072, ecapa_loss=0.0001735, whisper_loss=0.05862, over 18980.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01048, ecapa_loss=0.0001393, whisper_loss=0.08899, over 3813501.95 frames. ], batch size: 84, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:12:07,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5169850.0, ans=0.125 2024-08-21 09:12:11,792 INFO [train_multi_KD3.py:845] (3/4) A total of 71 cuts. 23 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-21 09:12:20,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=5169850.0, ans=0.2 2024-08-21 09:12:33,118 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 31 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-21 09:12:45,994 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 28 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-21 09:12:47,951 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 20 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-21 09:13:01,974 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.63 vs. limit=15.0 2024-08-21 09:13:19,954 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 30 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-21 09:13:41,447 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 13250, loss[loss=0.1195, beats_loss=0.009926, ecapa_loss=0.000215, whisper_loss=0.1074, over 15001.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01038, ecapa_loss=0.0001402, whisper_loss=0.08987, over 3803758.50 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:14:00,049 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 21 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-21 09:14:09,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=5170350.0, ans=0.125 2024-08-21 09:14:32,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5170450.0, ans=0.125 2024-08-21 09:15:07,981 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 26 from LS+wenet, 28 from Vox, 23 fro AS 2024-08-21 09:15:16,427 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.287e+01 2.552e+01 2.926e+01 1.195e+02, threshold=5.104e+01, percent-clipped=1.0 2024-08-21 09:15:24,211 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 35 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-21 09:15:41,133 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2024-08-21 09:15:43,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=5170650.0, ans=0.2 2024-08-21 09:15:53,595 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 13300, loss[loss=0.09173, beats_loss=0.01093, ecapa_loss=0.0001129, whisper_loss=0.07967, over 19071.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01032, ecapa_loss=0.00014, whisper_loss=0.09011, over 3771774.69 frames. ], batch size: 74, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:16:03,941 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 09:16:08,129 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 14 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-21 09:16:19,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=5170850.0, ans=0.2 2024-08-21 09:16:38,463 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-21 09:16:48,558 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 33 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-21 09:16:53,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=5170950.0, ans=0.125 2024-08-21 09:16:57,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5170950.0, ans=0.125 2024-08-21 09:17:09,205 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 35 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-21 09:17:41,662 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-21 09:17:58,194 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 13350, loss[loss=0.1318, beats_loss=0.008606, ecapa_loss=0.0001266, whisper_loss=0.1219, over 16246.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01035, ecapa_loss=0.000139, whisper_loss=0.08996, over 3785257.36 frames. ], batch size: 63, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:18:29,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5171350.0, ans=0.1 2024-08-21 09:18:36,203 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.17 vs. limit=15.0 2024-08-21 09:18:54,962 INFO [train_multi_KD3.py:845] (3/4) A total of 54 cuts. 15 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-21 09:19:06,339 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.88 vs. limit=22.5 2024-08-21 09:19:15,898 INFO [train_multi_KD3.py:845] (3/4) A total of 70 cuts. 23 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-21 09:19:23,101 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.340e+01 2.564e+01 2.896e+01 2.938e+02, threshold=5.128e+01, percent-clipped=2.0 2024-08-21 09:19:30,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=5171550.0, ans=0.125 2024-08-21 09:19:38,752 WARNING [optim.py:496] (3/4) Scaling gradients by 0.09627310186624527, model_norm_threshold=51.2801628112793 2024-08-21 09:19:38,914 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.670e+04, grad_sumsq=3.670e+04, orig_rms_sq=1.000e+00 2024-08-21 09:19:53,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=5171650.0, ans=0.2 2024-08-21 09:19:59,574 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 13400, loss[loss=0.1034, beats_loss=0.00932, ecapa_loss=0.0001319, whisper_loss=0.09276, over 16060.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01028, ecapa_loss=0.0001392, whisper_loss=0.09022, over 3781074.67 frames. ], batch size: 64, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:20:14,003 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 15 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-21 09:20:25,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5171850.0, ans=0.125 2024-08-21 09:20:31,192 INFO [train_multi_KD3.py:845] (3/4) A total of 73 cuts. 17 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-21 09:20:33,467 INFO [train_multi_KD3.py:845] (3/4) A total of 62 cuts. 20 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-21 09:20:49,427 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 22 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-21 09:21:08,295 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=19.10 vs. limit=15.0 2024-08-21 09:21:12,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=5172050.0, ans=0.07 2024-08-21 09:21:25,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=5172050.0, ans=0.0 2024-08-21 09:21:33,715 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2024-08-21 09:21:46,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5172150.0, ans=0.125 2024-08-21 09:21:49,555 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=22.5 2024-08-21 09:22:00,641 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 13450, loss[loss=0.1172, beats_loss=0.009798, ecapa_loss=0.0001202, whisper_loss=0.1062, over 23463.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01026, ecapa_loss=0.0001393, whisper_loss=0.09006, over 3754410.27 frames. ], batch size: 88, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:22:04,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5172250.0, ans=0.125 2024-08-21 09:22:14,443 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.39 vs. limit=10.0 2024-08-21 09:22:28,628 INFO [train_multi_KD3.py:845] (3/4) A total of 86 cuts. 22 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-21 09:22:31,174 INFO [train_multi_KD3.py:845] (3/4) A total of 95 cuts. 24 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-21 09:23:08,804 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.94 vs. limit=15.0 2024-08-21 09:23:12,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5172550.0, ans=0.0 2024-08-21 09:23:19,933 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.314e+01 2.499e+01 2.868e+01 5.327e+02, threshold=4.997e+01, percent-clipped=2.0 2024-08-21 09:23:21,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5172550.0, ans=0.0 2024-08-21 09:23:32,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5172650.0, ans=0.125 2024-08-21 09:23:34,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=5172650.0, ans=0.2 2024-08-21 09:23:44,921 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 12 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-21 09:23:48,509 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 09:23:54,393 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 13500, loss[loss=0.1155, beats_loss=0.01187, ecapa_loss=0.0001457, whisper_loss=0.1022, over 18817.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01029, ecapa_loss=0.000139, whisper_loss=0.09013, over 3790681.21 frames. ], batch size: 76, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:23:55,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=5172750.0, ans=0.2 2024-08-21 09:24:16,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5172850.0, ans=0.125 2024-08-21 09:24:20,406 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 36 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-21 09:24:42,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5172950.0, ans=0.125 2024-08-21 09:25:01,805 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.78 vs. limit=15.0 2024-08-21 09:25:07,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5173050.0, ans=0.125 2024-08-21 09:25:26,278 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.88 vs. limit=6.0 2024-08-21 09:25:52,235 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 13550, loss[loss=0.08363, beats_loss=0.01058, ecapa_loss=0.0001462, whisper_loss=0.07159, over 12735.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01029, ecapa_loss=0.0001381, whisper_loss=0.09005, over 3780976.69 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:26:28,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=5173350.0, ans=0.125 2024-08-21 09:27:16,463 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.759e+01 2.212e+01 2.430e+01 2.813e+01 4.061e+01, threshold=4.861e+01, percent-clipped=0.0 2024-08-21 09:27:24,469 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 21 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-21 09:27:26,889 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 35 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-21 09:27:32,512 INFO [train_multi_KD3.py:845] (3/4) A total of 84 cuts. 34 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-21 09:27:53,917 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 13600, loss[loss=0.1195, beats_loss=0.006, ecapa_loss=0.0001813, whisper_loss=0.1117, over 15940.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01034, ecapa_loss=0.0001378, whisper_loss=0.08969, over 3762878.43 frames. ], batch size: 64, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:27:58,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=5173750.0, ans=0.0 2024-08-21 09:28:20,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5173850.0, ans=0.125 2024-08-21 09:28:37,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=5173850.0, ans=0.125 2024-08-21 09:29:12,583 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 27 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-21 09:29:30,237 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-21 09:29:36,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=5174150.0, ans=0.0 2024-08-21 09:29:48,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=5174150.0, ans=0.0 2024-08-21 09:29:48,940 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.93 vs. limit=22.5 2024-08-21 09:29:59,836 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 13650, loss[loss=0.1031, beats_loss=0.01058, ecapa_loss=0.0001407, whisper_loss=0.0911, over 22891.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01028, ecapa_loss=0.0001387, whisper_loss=0.09007, over 3784836.68 frames. ], batch size: 91, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:30:07,427 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-21 09:30:13,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5174250.0, ans=0.125 2024-08-21 09:30:13,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5174250.0, ans=0.1 2024-08-21 09:30:33,070 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 22 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-21 09:30:36,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=5174350.0, ans=0.0 2024-08-21 09:30:39,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=5174350.0, ans=0.2 2024-08-21 09:31:21,035 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-21 09:31:26,415 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.617e+01 2.278e+01 2.472e+01 2.664e+01 8.830e+01, threshold=4.945e+01, percent-clipped=1.0 2024-08-21 09:31:30,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5174550.0, ans=0.1 2024-08-21 09:31:43,074 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.65 vs. limit=15.0 2024-08-21 09:31:50,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5174650.0, ans=0.1 2024-08-21 09:32:04,222 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 13700, loss[loss=0.08723, beats_loss=0.012, ecapa_loss=0.0001299, whisper_loss=0.07394, over 18289.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01033, ecapa_loss=0.000139, whisper_loss=0.09031, over 3795670.36 frames. ], batch size: 74, lr: 1.73e-03, grad_scale: 1.152921504606847e+18 2024-08-21 09:33:02,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5174950.0, ans=0.1 2024-08-21 09:33:14,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5175050.0, ans=0.125 2024-08-21 09:33:59,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=5175150.0, ans=0.125 2024-08-21 09:34:04,757 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 13750, loss[loss=0.08694, beats_loss=0.01138, ecapa_loss=0.0001592, whisper_loss=0.07397, over 14265.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01029, ecapa_loss=0.0001384, whisper_loss=0.09001, over 3754604.09 frames. ], batch size: 63, lr: 1.73e-03, grad_scale: 1.152921504606847e+18 2024-08-21 09:34:11,625 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2024-08-21 09:34:19,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5175250.0, ans=0.0 2024-08-21 09:34:30,505 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.96 vs. limit=22.5 2024-08-21 09:34:34,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=5175350.0, ans=0.0 2024-08-21 09:34:42,127 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.15 vs. limit=15.0 2024-08-21 09:34:45,511 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 09:35:08,782 INFO [train_multi_KD3.py:845] (3/4) A total of 57 cuts. 17 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-21 09:35:27,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=5175550.0, ans=10.0 2024-08-21 09:35:27,731 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.292e+01 2.541e+01 2.786e+01 7.539e+01, threshold=5.082e+01, percent-clipped=3.0 2024-08-21 09:36:01,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=5175650.0, ans=0.125 2024-08-21 09:36:06,925 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 13800, loss[loss=0.1033, beats_loss=0.008311, ecapa_loss=0.0001489, whisper_loss=0.09348, over 13336.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01027, ecapa_loss=0.0001384, whisper_loss=0.0897, over 3762219.65 frames. ], batch size: 52, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:36:10,900 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-08-21 09:36:35,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-21 09:36:35,820 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.25 vs. limit=22.5 2024-08-21 09:36:37,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5175850.0, ans=0.1 2024-08-21 09:36:41,629 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2024-08-21 09:37:13,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5175950.0, ans=0.0 2024-08-21 09:37:48,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5176050.0, ans=0.125 2024-08-21 09:38:21,659 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 13850, loss[loss=0.115, beats_loss=0.008275, ecapa_loss=0.0001454, whisper_loss=0.1052, over 18609.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01033, ecapa_loss=0.0001375, whisper_loss=0.09016, over 3754693.28 frames. ], batch size: 71, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:39:39,471 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-21 09:39:58,103 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.312e+01 2.430e+01 2.795e+01 3.774e+01, threshold=4.861e+01, percent-clipped=0.0 2024-08-21 09:40:33,015 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 13900, loss[loss=0.08794, beats_loss=0.01033, ecapa_loss=0.0001653, whisper_loss=0.07596, over 20898.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01034, ecapa_loss=0.0001389, whisper_loss=0.09003, over 3769925.72 frames. ], batch size: 91, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:41:06,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=5176850.0, ans=0.0 2024-08-21 09:41:10,287 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 25 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-21 09:41:15,403 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 13 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-21 09:41:40,136 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-21 09:42:01,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=5177050.0, ans=0.2 2024-08-21 09:42:15,158 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 19 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-21 09:42:28,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5177150.0, ans=0.1 2024-08-21 09:42:30,936 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.982e+01 2024-08-21 09:42:31,824 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 13950, loss[loss=0.08039, beats_loss=0.01023, ecapa_loss=0.0001649, whisper_loss=0.06852, over 15849.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01041, ecapa_loss=0.0001391, whisper_loss=0.08967, over 3762646.72 frames. ], batch size: 65, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:42:36,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5177250.0, ans=0.125 2024-08-21 09:42:37,177 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 34 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-21 09:42:49,858 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 22 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-21 09:42:55,337 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 24 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-21 09:43:05,184 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 16 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-21 09:43:06,607 INFO [train_multi_KD3.py:845] (3/4) A total of 79 cuts. 25 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-21 09:43:26,598 INFO [train_multi_KD3.py:845] (3/4) A total of 89 cuts. 37 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-21 09:43:37,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5177450.0, ans=0.125 2024-08-21 09:43:58,443 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.301e+01 2.643e+01 2.947e+01 4.607e+01, threshold=5.286e+01, percent-clipped=0.0 2024-08-21 09:44:14,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=5177650.0, ans=0.125 2024-08-21 09:44:24,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=5177650.0, ans=0.125 2024-08-21 09:44:31,670 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 14000, loss[loss=0.09339, beats_loss=0.00928, ecapa_loss=0.0001454, whisper_loss=0.08265, over 20486.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01037, ecapa_loss=0.0001394, whisper_loss=0.0898, over 3755801.91 frames. ], batch size: 83, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:45:04,343 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.55 vs. limit=15.0 2024-08-21 09:45:11,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2024-08-21 09:45:20,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=5177950.0, ans=0.2 2024-08-21 09:45:35,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5177950.0, ans=0.1 2024-08-21 09:46:05,416 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.13 vs. limit=22.5 2024-08-21 09:46:12,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=5178150.0, ans=0.2 2024-08-21 09:46:14,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=5178150.0, ans=0.0 2024-08-21 09:46:20,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5178150.0, ans=0.125 2024-08-21 09:46:23,930 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 14050, loss[loss=0.09311, beats_loss=0.009601, ecapa_loss=0.0001408, whisper_loss=0.0821, over 23538.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01031, ecapa_loss=0.0001389, whisper_loss=0.09046, over 3784608.26 frames. ], batch size: 93, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:47:11,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5178450.0, ans=0.0 2024-08-21 09:47:20,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=5178450.0, ans=0.05 2024-08-21 09:47:48,874 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.709e+01 2.318e+01 2.539e+01 2.809e+01 4.112e+01, threshold=5.078e+01, percent-clipped=0.0 2024-08-21 09:48:11,066 INFO [train_multi_KD3.py:845] (3/4) A total of 53 cuts. 18 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-21 09:48:16,455 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 14100, loss[loss=0.08807, beats_loss=0.01164, ecapa_loss=0.0001496, whisper_loss=0.07493, over 15030.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01038, ecapa_loss=0.0001387, whisper_loss=0.0898, over 3803188.65 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:48:36,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5178850.0, ans=0.0 2024-08-21 09:48:38,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=5178850.0, ans=0.0 2024-08-21 09:48:51,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=5178850.0, ans=0.125 2024-08-21 09:48:53,197 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.35 vs. limit=15.0 2024-08-21 09:48:59,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=5178950.0, ans=0.05 2024-08-21 09:49:10,075 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 17 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-21 09:49:44,956 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 35 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-21 09:49:50,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=5179150.0, ans=0.125 2024-08-21 09:49:50,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=5179150.0, ans=0.125 2024-08-21 09:49:57,678 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-21 09:50:01,592 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 14150, loss[loss=0.1206, beats_loss=0.008706, ecapa_loss=0.0001403, whisper_loss=0.1105, over 23182.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0103, ecapa_loss=0.0001391, whisper_loss=0.09061, over 3796945.51 frames. ], batch size: 90, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:50:04,098 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2024-08-21 09:50:08,211 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-21 09:50:11,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5179250.0, ans=0.125 2024-08-21 09:50:11,795 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.76 vs. limit=15.0 2024-08-21 09:50:44,059 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 15 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-21 09:51:02,146 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 23 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-21 09:51:19,380 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.247e+01 2.512e+01 2.809e+01 5.073e+01, threshold=5.024e+01, percent-clipped=0.0 2024-08-21 09:51:21,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=5179550.0, ans=0.0 2024-08-21 09:51:22,859 INFO [train_multi_KD3.py:845] (3/4) A total of 80 cuts. 31 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-21 09:51:36,838 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 32 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-21 09:51:37,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5179650.0, ans=0.1 2024-08-21 09:51:52,971 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 14200, loss[loss=0.1134, beats_loss=0.0094, ecapa_loss=0.0001517, whisper_loss=0.1025, over 19475.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01025, ecapa_loss=0.0001396, whisper_loss=0.09066, over 3777039.09 frames. ], batch size: 77, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:52:00,048 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 32 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-21 09:53:15,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=5180050.0, ans=0.2 2024-08-21 09:53:17,336 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.43 vs. limit=10.0 2024-08-21 09:53:31,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=5180150.0, ans=0.2 2024-08-21 09:53:56,779 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 14250, loss[loss=0.07686, beats_loss=0.009036, ecapa_loss=0.0001465, whisper_loss=0.06636, over 18229.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0103, ecapa_loss=0.0001394, whisper_loss=0.09037, over 3800051.44 frames. ], batch size: 73, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:54:20,516 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.77 vs. limit=22.5 2024-08-21 09:54:27,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=5180350.0, ans=0.09899494936611666 2024-08-21 09:54:54,724 INFO [train_multi_KD3.py:845] (3/4) A total of 81 cuts. 29 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-21 09:55:11,876 INFO [train_multi_KD3.py:845] (3/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-21 09:55:16,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5180550.0, ans=0.0 2024-08-21 09:55:26,039 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.191e+01 2.440e+01 2.689e+01 6.038e+01, threshold=4.881e+01, percent-clipped=1.0 2024-08-21 09:55:29,523 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 16 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-21 09:55:45,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=5180650.0, ans=0.125 2024-08-21 09:55:51,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5180650.0, ans=0.125 2024-08-21 09:56:03,530 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 14300, loss[loss=0.1066, beats_loss=0.007478, ecapa_loss=0.0001145, whisper_loss=0.09798, over 20349.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01032, ecapa_loss=0.0001388, whisper_loss=0.09082, over 3833306.90 frames. ], batch size: 74, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:56:10,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5180750.0, ans=0.125 2024-08-21 09:56:18,603 INFO [train_multi_KD3.py:845] (3/4) A total of 92 cuts. 35 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-21 09:56:36,239 INFO [train_multi_KD3.py:845] (3/4) A total of 93 cuts. 25 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-21 09:57:11,852 INFO [train_multi_KD3.py:845] (3/4) A total of 64 cuts. 19 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-21 09:57:32,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5181050.0, ans=0.0 2024-08-21 09:57:42,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5181050.0, ans=0.125 2024-08-21 09:57:44,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5181150.0, ans=0.0 2024-08-21 09:58:01,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5181150.0, ans=0.0 2024-08-21 09:58:06,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5181150.0, ans=0.0 2024-08-21 09:58:09,808 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 14350, loss[loss=0.109, beats_loss=0.01043, ecapa_loss=0.0001378, whisper_loss=0.09721, over 16992.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01038, ecapa_loss=0.0001391, whisper_loss=0.09021, over 3812488.15 frames. ], batch size: 68, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:58:14,289 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.13 vs. limit=15.0 2024-08-21 09:58:16,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5181250.0, ans=0.125 2024-08-21 09:58:32,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=5181250.0, ans=0.2 2024-08-21 09:58:48,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5181350.0, ans=0.125 2024-08-21 09:59:12,026 INFO [train_multi_KD3.py:845] (3/4) A total of 60 cuts. 15 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-21 09:59:16,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=5181450.0, ans=0.0 2024-08-21 09:59:30,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=5181550.0, ans=0.2 2024-08-21 09:59:34,374 INFO [train_multi_KD3.py:845] (3/4) A total of 56 cuts. 13 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-21 09:59:42,537 INFO [train_multi_KD3.py:845] (3/4) A total of 65 cuts. 20 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-21 09:59:43,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=5181550.0, ans=0.2 2024-08-21 09:59:44,546 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.249e+01 2.480e+01 2.767e+01 3.884e+01, threshold=4.959e+01, percent-clipped=0.0 2024-08-21 09:59:46,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=5181550.0, ans=0.0 2024-08-21 10:00:10,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=5181650.0, ans=0.125 2024-08-21 10:00:19,257 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 14400, loss[loss=0.105, beats_loss=0.0112, ecapa_loss=0.0001396, whisper_loss=0.09241, over 21874.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01036, ecapa_loss=0.0001389, whisper_loss=0.09029, over 3819230.86 frames. ], batch size: 90, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:00:50,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=5181850.0, ans=0.1 2024-08-21 10:00:56,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=5181850.0, ans=0.0 2024-08-21 10:01:00,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=5181850.0, ans=0.125 2024-08-21 10:01:17,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=5181950.0, ans=0.0 2024-08-21 10:01:28,342 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.29 vs. limit=15.0 2024-08-21 10:01:43,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=5182050.0, ans=0.0 2024-08-21 10:01:49,941 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.80 vs. limit=22.5 2024-08-21 10:01:56,899 INFO [train_multi_KD3.py:845] (3/4) A total of 72 cuts. 16 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-21 10:02:20,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=5182150.0, ans=0.2 2024-08-21 10:02:24,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5182150.0, ans=0.125 2024-08-21 10:02:28,994 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 14450, loss[loss=0.1105, beats_loss=0.007774, ecapa_loss=0.0001599, whisper_loss=0.1011, over 17779.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01032, ecapa_loss=0.0001385, whisper_loss=0.09058, over 3821869.59 frames. ], batch size: 68, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:02:41,795 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=22.5 2024-08-21 10:02:46,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=5182250.0, ans=0.125 2024-08-21 10:03:46,993 INFO [train_multi_KD3.py:845] (3/4) A total of 94 cuts. 34 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-21 10:04:00,891 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 10 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-21 10:04:03,211 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.252e+01 2.493e+01 2.789e+01 4.722e+01, threshold=4.987e+01, percent-clipped=0.0 2024-08-21 10:04:11,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5182550.0, ans=0.0 2024-08-21 10:04:20,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=5182650.0, ans=0.04949747468305833 2024-08-21 10:04:26,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5182650.0, ans=0.1 2024-08-21 10:04:40,404 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 14500, loss[loss=0.1094, beats_loss=0.01055, ecapa_loss=0.0001354, whisper_loss=0.09753, over 22161.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01035, ecapa_loss=0.0001377, whisper_loss=0.0908, over 3870518.12 frames. ], batch size: 91, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:05:07,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5182850.0, ans=0.125 2024-08-21 10:05:51,252 INFO [train_multi_KD3.py:845] (3/4) A total of 58 cuts. 17 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-21 10:06:06,051 INFO [train_multi_KD3.py:845] (3/4) A total of 88 cuts. 31 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-21 10:06:08,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=5183050.0, ans=0.0 2024-08-21 10:06:12,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=5183050.0, ans=0.2 2024-08-21 10:06:18,593 INFO [train_multi_KD3.py:845] (3/4) A total of 77 cuts. 22 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-21 10:06:36,922 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 21 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-21 10:06:38,241 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.42 vs. limit=15.0 2024-08-21 10:06:50,541 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 14550, loss[loss=0.09306, beats_loss=0.009183, ecapa_loss=0.0001411, whisper_loss=0.08246, over 18444.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01038, ecapa_loss=0.0001369, whisper_loss=0.09042, over 3855144.39 frames. ], batch size: 74, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:06:57,619 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.20 vs. limit=15.0 2024-08-21 10:08:08,888 INFO [train_multi_KD3.py:845] (3/4) A total of 67 cuts. 20 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-21 10:08:25,930 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.316e+01 2.552e+01 2.879e+01 5.154e+01, threshold=5.103e+01, percent-clipped=1.0 2024-08-21 10:08:59,316 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 26 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-21 10:09:01,748 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 14600, loss[loss=0.109, beats_loss=0.007504, ecapa_loss=0.0001478, whisper_loss=0.1001, over 19326.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01035, ecapa_loss=0.0001371, whisper_loss=0.09019, over 3850638.91 frames. ], batch size: 76, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:09:02,646 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.837e+05 2024-08-21 10:09:14,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=5183750.0, ans=0.0 2024-08-21 10:09:23,816 INFO [train_multi_KD3.py:845] (3/4) A total of 66 cuts. 17 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-21 10:09:30,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=5183850.0, ans=0.0 2024-08-21 10:10:32,792 INFO [train_multi_KD3.py:845] (3/4) A total of 83 cuts. 29 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-21 10:10:34,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5184050.0, ans=0.0 2024-08-21 10:11:03,541 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 14650, loss[loss=0.1261, beats_loss=0.006637, ecapa_loss=0.0001621, whisper_loss=0.1178, over 12948.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0104, ecapa_loss=0.0001372, whisper_loss=0.08982, over 3841346.29 frames. ], batch size: 51, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:11:07,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2024-08-21 10:11:09,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5184250.0, ans=0.125 2024-08-21 10:11:27,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5184350.0, ans=0.0 2024-08-21 10:11:47,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5184450.0, ans=0.125 2024-08-21 10:12:21,206 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.849e+00 2024-08-21 10:12:26,782 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.717e+01 2.280e+01 2.543e+01 2.836e+01 3.661e+01, threshold=5.086e+01, percent-clipped=0.0 2024-08-21 10:12:29,921 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 30 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-21 10:12:55,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=5184650.0, ans=0.0 2024-08-21 10:12:56,954 INFO [train_multi_KD3.py:845] (3/4) A total of 87 cuts. 23 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-21 10:13:01,681 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 14700, loss[loss=0.09559, beats_loss=0.01028, ecapa_loss=0.0002, whisper_loss=0.08331, over 21063.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01031, ecapa_loss=0.0001399, whisper_loss=0.09028, over 3849520.57 frames. ], batch size: 91, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:13:10,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5184750.0, ans=0.0 2024-08-21 10:13:37,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5184850.0, ans=0.0 2024-08-21 10:14:11,152 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.86 vs. limit=15.0 2024-08-21 10:14:29,275 INFO [train_multi_KD3.py:845] (3/4) A total of 68 cuts. 22 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-21 10:14:50,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5185150.0, ans=0.125 2024-08-21 10:14:51,468 INFO [train_multi_KD3.py:845] (3/4) A total of 90 cuts. 29 from LS+wenet, 33 from Vox, 28 fro AS 2024-08-21 10:14:54,107 INFO [train_multi_KD3.py:845] (3/4) A total of 49 cuts. 11 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-21 10:15:00,156 INFO [train_multi_KD3.py:845] (3/4) A total of 59 cuts. 13 from LS+wenet, 9 from Vox, 37 fro AS 2024-08-21 10:15:04,799 INFO [train_multi_KD3.py:1117] (3/4) Epoch 35, batch 14750, loss[loss=0.08396, beats_loss=0.01318, ecapa_loss=0.0001124, whisper_loss=0.06966, over 21541.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01035, ecapa_loss=0.0001396, whisper_loss=0.09032, over 3863326.22 frames. ], batch size: 89, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:15:37,632 INFO [train_multi_KD3.py:845] (3/4) A total of 76 cuts. 22 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-21 10:16:35,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=5185550.0, ans=0.0 2024-08-21 10:16:38,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5185550.0, ans=0.125 2024-08-21 10:16:39,171 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.269e+01 2.538e+01 2.783e+01 3.650e+01, threshold=5.076e+01, percent-clipped=0.0 2024-08-21 10:16:46,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5185550.0, ans=0.125 2024-08-21 10:16:57,053 INFO [train_multi_KD3.py:1466] (3/4) Done!