2024-08-19 16:40:02,060 INFO [train_multi_KD3.py:1188] (0/4) Training started 2024-08-19 16:40:02,066 INFO [train_multi_KD3.py:1198] (0/4) Device: cuda:0 2024-08-19 16:40:02,066 INFO [train_multi_KD3.py:1214] (0/4) Using dtype=torch.bfloat16 2024-08-19 16:40:02,067 INFO [train_multi_KD3.py:1216] (0/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'e400fa3b456faf8afe0ee5bfe572946b4921a3db', 'k2-git-date': 'Sat Jul 15 04:21:50 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.9', 'icefall-git-branch': 'multi_KD_with_wenet', 'icefall-git-sha1': '3210a8ed-dirty', 'icefall-git-date': 'Mon Aug 19 16:16:48 2024', 'icefall-path': '/xy/mnt/yangxiaoyu/workspace/icefall_multi_KD', 'k2-path': '/root/anaconda3/lib/python3.9/site-packages/k2/__init__.py', 'lhotse-path': '/root/anaconda3/lib/python3.9/site-packages/lhotse/__init__.py', 'hostname': 'NGK_xiaoyu'}, 'world_size': 4, 'master_port': 13440, 'tensorboard': True, 'num_epochs': 35, 'start_epoch': 31, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'stop_early': True, 'use_fp16': False, 'use_bf16': True, 'share_asr': True, 'beats_loss_scale': 1.0, 'ecapa_loss_scale': 10.0, 'whisper_loss_scale': 1.0, 'whisper_cb_loss_scale': 0.01, 'repeat_librispeech': 5, 'repeat_wenetspeech': 0, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'speaker_input_idx': 2, 'whisper_dim': 1280, 'use_task_id': True, 'num_codebooks': 32, 'mvq_kd_layer_idx': -1, 'use_subsampled_output': True, 'delta_t': 6, 'full_libri': True, 'mini_libri': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_librispeech': True, 'use_wenetspeech': False, 'use_audioset': True, 'audioset_subset': 'unbalanced', 'use_voxceleb': True, 'voxceleb_subset': 'vox2', 'use_fma': False, 'fma_subset': 'large', 'manifest_dir': PosixPath('data/fbank_LSVoxAs_with_whisper_large-v3_with_taskID'), 'max_duration': 1500, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 1, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'enable_musan': False, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'large-v3', 'use_mert': False, 'dtype': torch.bfloat16, 'use_amp': True} 2024-08-19 16:40:02,067 INFO [train_multi_KD3.py:1218] (0/4) About to create model 2024-08-19 16:40:02,411 INFO [model_shift.py:142] (0/4) Delta_t: 6 when computing the distillation loss 2024-08-19 16:40:02,415 INFO [train_multi_KD3.py:1222] (0/4) Number of model parameters: 66484678 2024-08-19 16:40:02,891 INFO [checkpoint.py:112] (0/4) Loading checkpoint from multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-30.pt 2024-08-19 16:40:03,554 INFO [checkpoint.py:131] (0/4) Loading averaged model 2024-08-19 16:40:04,802 INFO [train_multi_KD3.py:1237] (0/4) Using DDP 2024-08-19 16:40:06,065 INFO [train_multi_KD3.py:1249] (0/4) Loading optimizer state dict 2024-08-19 16:40:06,305 INFO [train_multi_KD3.py:1257] (0/4) Loading scheduler state dict 2024-08-19 16:40:06,305 INFO [kd_datamodule.py:690] (0/4) About to get train 960 cuts 2024-08-19 16:40:06,344 INFO [kd_datamodule.py:862] (0/4) About to get the voxceleb cuts. 2024-08-19 16:40:06,345 INFO [kd_datamodule.py:873] (0/4) Adding voxceleb2 cuts. 2024-08-19 16:40:06,346 INFO [train_multi_KD3.py:1320] (0/4) Getting audioset cuts 2024-08-19 16:40:06,347 INFO [kd_datamodule.py:881] (0/4) About to get the audioset cuts for KD. 2024-08-19 16:40:06,349 INFO [train_multi_KD3.py:1326] (0/4) Using mux to combine Librispeech: True, WenetSpeech: False, audioset: True and voxceleb: True 2024-08-19 16:40:14,312 INFO [train_multi_KD3.py:1328] (0/4) Using mux to combine [CutSet(len=1406195) [underlying data type: ], CutSet(len=1187704) [underlying data type: ], CutSet(len=1904746) [underlying data type: ]] 2024-08-19 16:40:14,312 INFO [train_multi_KD3.py:1329] (0/4) Using weights: [1406195, 1187704, 1904746] 2024-08-19 16:40:14,312 INFO [train_multi_KD3.py:1338] (0/4) CutSet(len=4498645) [underlying data type: ] 2024-08-19 16:40:14,312 INFO [kd_datamodule.py:449] (0/4) Disable MUSAN 2024-08-19 16:40:14,314 INFO [kd_datamodule.py:489] (0/4) Disable SpecAugment 2024-08-19 16:40:14,314 INFO [kd_datamodule.py:491] (0/4) About to create train dataset 2024-08-19 16:40:14,314 INFO [kd_datamodule.py:528] (0/4) Using SimpleCutSampler 2024-08-19 16:40:14,314 INFO [kd_datamodule.py:536] (0/4) About to create train dataloader 2024-08-19 16:40:14,315 INFO [kd_datamodule.py:756] (0/4) About to get dev-clean cuts 2024-08-19 16:40:14,316 INFO [kd_datamodule.py:774] (0/4) About to get dev-other cuts 2024-08-19 16:40:14,317 INFO [kd_datamodule.py:570] (0/4) About to create dev dataset 2024-08-19 16:40:14,604 INFO [kd_datamodule.py:591] (0/4) About to create dev dataloader 2024-08-19 16:40:14,605 INFO [kd_datamodule.py:833] (0/4) About to get the test set of voxceleb1 set. 2024-08-19 16:40:14,605 INFO [kd_datamodule.py:570] (0/4) About to create dev dataset 2024-08-19 16:40:14,843 INFO [kd_datamodule.py:591] (0/4) About to create dev dataloader 2024-08-19 16:40:14,843 INFO [kd_datamodule.py:893] (0/4) About to get the audioset eval cuts. 2024-08-19 16:40:14,844 INFO [kd_datamodule.py:570] (0/4) About to create dev dataset 2024-08-19 16:40:15,359 INFO [kd_datamodule.py:591] (0/4) About to create dev dataloader 2024-08-19 16:40:15,359 INFO [train_multi_KD3.py:1418] (0/4) ['ASR_libri', 'SV_voxceleb1', 'AT_audioset'] 2024-08-19 16:40:15,360 INFO [train_multi_KD3.py:1422] (0/4) Loading grad scaler state dict 2024-08-19 16:40:31,732 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 0, loss[loss=0.09683, beats_loss=0.008545, ecapa_loss=0.0001498, whisper_loss=0.08678, over 23157.00 frames. ], tot_loss[loss=0.09683, beats_loss=0.008545, ecapa_loss=0.0001498, whisper_loss=0.08678, over 23157.00 frames. ], batch size: 91, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:40:31,734 INFO [train_multi_KD3.py:1140] (0/4) Computing validation loss 2024-08-19 16:41:05,573 INFO [train_multi_KD3.py:1150] (0/4) Epoch 31, validation on ASR_libri: loss=0.253, beats_loss=0, ecapa_loss=0.0005148, whisper_loss=0.2478, over 931116.00 frames. 2024-08-19 16:41:25,482 INFO [train_multi_KD3.py:1150] (0/4) Epoch 31, validation on SV_voxceleb1: loss=0.003992, beats_loss=0, ecapa_loss=0.0003992, whisper_loss=0, over 944235.00 frames. 2024-08-19 16:42:59,030 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([1.9238, 1.5195, 1.6963, 1.5809, 1.8080, 1.5377, 1.6083, 1.5093], device='cuda:0') 2024-08-19 16:42:59,786 INFO [train_multi_KD3.py:1150] (0/4) Epoch 31, validation on AT_audioset: loss=0.02301, beats_loss=0.02301, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 16:42:59,788 INFO [train_multi_KD3.py:1156] (0/4) Maximum memory allocated so far is 32775MB 2024-08-19 16:43:08,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4445890.0, ans=0.125 2024-08-19 16:43:36,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=4445990.0, ans=0.1 2024-08-19 16:43:55,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4446090.0, ans=0.0 2024-08-19 16:44:43,279 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.403e+01 2.746e+01 3.133e+01 8.282e+01, threshold=5.492e+01, percent-clipped=1.0 2024-08-19 16:44:53,171 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 19 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-19 16:44:56,457 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.28 vs. limit=15.0 2024-08-19 16:45:00,510 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 50, loss[loss=0.1152, beats_loss=0.006495, ecapa_loss=0.0001498, whisper_loss=0.1072, over 18250.00 frames. ], tot_loss[loss=0.09833, beats_loss=0.009242, ecapa_loss=0.0001466, whisper_loss=0.08762, over 861656.05 frames. ], batch size: 69, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:45:09,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4446390.0, ans=0.125 2024-08-19 16:45:13,382 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 23 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-19 16:45:20,972 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 16:45:36,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4446490.0, ans=0.125 2024-08-19 16:45:46,386 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 33 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-19 16:46:05,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4446590.0, ans=0.125 2024-08-19 16:46:14,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4446690.0, ans=0.125 2024-08-19 16:46:37,657 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-19 16:46:42,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4446790.0, ans=0.1 2024-08-19 16:46:55,258 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 100, loss[loss=0.05772, beats_loss=0.01276, ecapa_loss=0.0001467, whisper_loss=0.04349, over 15945.00 frames. ], tot_loss[loss=0.09879, beats_loss=0.009281, ecapa_loss=0.0001435, whisper_loss=0.08807, over 1514115.56 frames. ], batch size: 65, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:46:55,463 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 10 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-19 16:47:01,045 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.43 vs. limit=22.5 2024-08-19 16:47:20,499 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.16 vs. limit=15.0 2024-08-19 16:47:39,336 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 32 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-19 16:47:55,850 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 14 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-19 16:47:56,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4447090.0, ans=0.0 2024-08-19 16:48:00,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4447190.0, ans=0.0 2024-08-19 16:48:02,585 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.81 vs. limit=15.0 2024-08-19 16:48:24,823 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.103e+01 2.615e+01 2.787e+01 3.101e+01 5.493e+01, threshold=5.575e+01, percent-clipped=1.0 2024-08-19 16:48:40,686 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 150, loss[loss=0.09476, beats_loss=0.009054, ecapa_loss=0.0001386, whisper_loss=0.08432, over 21452.00 frames. ], tot_loss[loss=0.09979, beats_loss=0.009159, ecapa_loss=0.0001445, whisper_loss=0.08919, over 1996888.11 frames. ], batch size: 87, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:48:41,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4447390.0, ans=0.0 2024-08-19 16:48:45,981 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.87 vs. limit=12.0 2024-08-19 16:49:34,820 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 16 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-19 16:49:47,585 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 20 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-19 16:50:12,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4447890.0, ans=0.0 2024-08-19 16:50:13,668 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 200, loss[loss=0.08807, beats_loss=0.01165, ecapa_loss=0.0001535, whisper_loss=0.07488, over 19082.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.009463, ecapa_loss=0.0001438, whisper_loss=0.08919, over 2371903.29 frames. ], batch size: 76, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:50:17,465 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 18 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-19 16:50:17,752 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 16:50:32,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4447990.0, ans=0.125 2024-08-19 16:50:38,861 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 22 from LS+wenet, 13 from Vox, 18 fro AS 2024-08-19 16:50:58,313 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.93 vs. limit=10.0 2024-08-19 16:51:09,864 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 22 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-19 16:51:11,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4448190.0, ans=0.125 2024-08-19 16:51:20,982 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 16:51:24,145 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.343e+01 2.586e+01 2.857e+01 5.487e+01, threshold=5.172e+01, percent-clipped=0.0 2024-08-19 16:51:26,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4448290.0, ans=0.0 2024-08-19 16:51:33,368 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 16:51:38,000 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 250, loss[loss=0.09679, beats_loss=0.01214, ecapa_loss=0.0001585, whisper_loss=0.08307, over 16627.00 frames. ], tot_loss[loss=0.09997, beats_loss=0.009842, ecapa_loss=0.0001425, whisper_loss=0.0887, over 2650148.56 frames. ], batch size: 69, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:51:38,209 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 37 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 16:51:43,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4448390.0, ans=0.1 2024-08-19 16:52:25,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4448690.0, ans=0.0 2024-08-19 16:52:29,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4448690.0, ans=0.0 2024-08-19 16:52:50,656 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 36 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-19 16:52:55,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4448790.0, ans=0.0 2024-08-19 16:52:55,602 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.06 vs. limit=15.0 2024-08-19 16:53:03,183 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 300, loss[loss=0.07142, beats_loss=0.01566, ecapa_loss=0.0001081, whisper_loss=0.05468, over 21752.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01005, ecapa_loss=0.0001425, whisper_loss=0.08883, over 2908757.36 frames. ], batch size: 88, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 16:53:16,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4448890.0, ans=0.0 2024-08-19 16:53:18,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4448990.0, ans=0.125 2024-08-19 16:53:33,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4449090.0, ans=0.0 2024-08-19 16:53:36,443 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 23 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-19 16:53:38,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4449090.0, ans=0.125 2024-08-19 16:54:06,325 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 17 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-19 16:54:11,287 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.292e+01 2.578e+01 2.922e+01 3.653e+02, threshold=5.156e+01, percent-clipped=3.0 2024-08-19 16:54:11,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4449290.0, ans=0.0 2024-08-19 16:54:23,843 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 350, loss[loss=0.06856, beats_loss=0.01211, ecapa_loss=0.0001057, whisper_loss=0.05539, over 16330.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01014, ecapa_loss=0.0001412, whisper_loss=0.08914, over 3083346.56 frames. ], batch size: 62, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 16:54:25,580 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 19 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-19 16:54:25,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4449390.0, ans=0.0 2024-08-19 16:54:29,017 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 18 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 16:55:04,124 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.93 vs. limit=10.0 2024-08-19 16:55:39,073 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-19 16:55:42,086 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 34 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-19 16:55:43,244 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 400, loss[loss=0.119, beats_loss=0.009663, ecapa_loss=0.0001267, whisper_loss=0.1081, over 23218.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01022, ecapa_loss=0.0001397, whisper_loss=0.08859, over 3231342.42 frames. ], batch size: 88, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 16:55:50,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4449890.0, ans=0.0 2024-08-19 16:55:53,827 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 25 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-19 16:55:58,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4449990.0, ans=0.5 2024-08-19 16:56:04,658 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 25 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-19 16:56:11,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4449990.0, ans=0.125 2024-08-19 16:56:33,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4450190.0, ans=0.125 2024-08-19 16:56:34,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=4450190.0, ans=0.025 2024-08-19 16:56:40,092 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=6.647e-02 2024-08-19 16:56:52,312 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.167e+01 2.459e+01 2.674e+01 3.849e+01, threshold=4.918e+01, percent-clipped=0.0 2024-08-19 16:56:57,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4450290.0, ans=0.0 2024-08-19 16:57:03,515 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08908264338970184, model_norm_threshold=49.18006896972656 2024-08-19 16:57:03,674 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.931e+04, grad_sumsq=3.931e+04, orig_rms_sq=1.000e+00 2024-08-19 16:57:03,888 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 15 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 16:57:05,399 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 450, loss[loss=0.08138, beats_loss=0.0119, ecapa_loss=0.0001756, whisper_loss=0.06773, over 15474.00 frames. ], tot_loss[loss=0.09926, beats_loss=0.01035, ecapa_loss=0.00014, whisper_loss=0.08751, over 3326167.54 frames. ], batch size: 64, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 16:57:31,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4450490.0, ans=0.05 2024-08-19 16:57:45,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4450590.0, ans=0.125 2024-08-19 16:57:47,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4450590.0, ans=10.0 2024-08-19 16:57:56,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4450690.0, ans=0.0 2024-08-19 16:58:05,649 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2024-08-19 16:58:26,576 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 14 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-19 16:58:28,190 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 500, loss[loss=0.08382, beats_loss=0.01228, ecapa_loss=9.353e-05, whisper_loss=0.0706, over 15128.00 frames. ], tot_loss[loss=0.09952, beats_loss=0.01035, ecapa_loss=0.00014, whisper_loss=0.08778, over 3428194.77 frames. ], batch size: 57, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 16:58:52,757 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.25 vs. limit=10.0 2024-08-19 16:59:02,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4451090.0, ans=0.0 2024-08-19 16:59:09,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4451090.0, ans=0.125 2024-08-19 16:59:18,001 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 24 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-19 16:59:36,733 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.363e+01 2.706e+01 3.045e+01 5.521e+02, threshold=5.412e+01, percent-clipped=1.0 2024-08-19 16:59:49,868 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 550, loss[loss=0.1125, beats_loss=0.008401, ecapa_loss=0.0001681, whisper_loss=0.1024, over 17863.00 frames. ], tot_loss[loss=0.09957, beats_loss=0.01034, ecapa_loss=0.0001401, whisper_loss=0.08783, over 3519970.75 frames. ], batch size: 72, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 16:59:55,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4451390.0, ans=0.125 2024-08-19 17:00:08,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4451390.0, ans=0.1 2024-08-19 17:00:12,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=4451390.0, ans=0.1 2024-08-19 17:00:32,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4451590.0, ans=0.0 2024-08-19 17:00:53,799 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 37 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-19 17:00:57,168 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 17:01:11,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4451790.0, ans=0.0 2024-08-19 17:01:16,106 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.37 vs. limit=15.0 2024-08-19 17:01:27,122 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 600, loss[loss=0.05896, beats_loss=0.01173, ecapa_loss=0.0001307, whisper_loss=0.04593, over 19763.00 frames. ], tot_loss[loss=0.09977, beats_loss=0.01037, ecapa_loss=0.0001403, whisper_loss=0.088, over 3576217.77 frames. ], batch size: 79, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:01:27,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4451890.0, ans=0.125 2024-08-19 17:01:52,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4451990.0, ans=0.2 2024-08-19 17:01:57,861 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 23 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-19 17:01:59,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4452090.0, ans=0.1 2024-08-19 17:02:05,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4452090.0, ans=0.125 2024-08-19 17:02:13,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4452090.0, ans=0.0 2024-08-19 17:02:25,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4452190.0, ans=0.125 2024-08-19 17:02:29,817 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 27 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 17:02:38,432 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.243e+01 2.476e+01 2.748e+01 6.280e+01, threshold=4.953e+01, percent-clipped=2.0 2024-08-19 17:02:43,756 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 22 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-19 17:02:53,189 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 650, loss[loss=0.112, beats_loss=0.006076, ecapa_loss=0.000187, whisper_loss=0.104, over 13167.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01036, ecapa_loss=0.0001392, whisper_loss=0.08842, over 3631503.60 frames. ], batch size: 52, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:02:53,418 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 14 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-19 17:03:02,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4452390.0, ans=0.1 2024-08-19 17:03:05,597 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 21 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 17:03:17,914 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.69 vs. limit=15.0 2024-08-19 17:03:54,207 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2024-08-19 17:04:01,986 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 20 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-19 17:04:10,006 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 17 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 17:04:13,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4452790.0, ans=0.5 2024-08-19 17:04:17,930 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 700, loss[loss=0.1, beats_loss=0.009138, ecapa_loss=0.0001344, whisper_loss=0.08956, over 18848.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01041, ecapa_loss=0.0001389, whisper_loss=0.0885, over 3643696.60 frames. ], batch size: 71, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:04:33,347 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 21 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 17:05:11,332 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 36 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 17:05:27,769 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.284e+01 2.551e+01 2.853e+01 6.068e+01, threshold=5.102e+01, percent-clipped=1.0 2024-08-19 17:05:28,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4453290.0, ans=0.125 2024-08-19 17:05:38,031 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 16 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-19 17:05:38,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4453290.0, ans=0.0 2024-08-19 17:05:41,200 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 750, loss[loss=0.1161, beats_loss=0.009222, ecapa_loss=0.0001401, whisper_loss=0.1055, over 17917.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01037, ecapa_loss=0.0001401, whisper_loss=0.08877, over 3673867.78 frames. ], batch size: 67, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:05:46,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4453390.0, ans=0.0 2024-08-19 17:05:50,045 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 17:06:05,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4453490.0, ans=0.125 2024-08-19 17:06:21,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4453590.0, ans=0.04949747468305833 2024-08-19 17:06:22,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4453590.0, ans=0.125 2024-08-19 17:06:25,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4453590.0, ans=0.0 2024-08-19 17:06:39,613 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 29 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-19 17:06:53,824 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.13 vs. limit=15.0 2024-08-19 17:07:07,256 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 800, loss[loss=0.1149, beats_loss=0.007836, ecapa_loss=0.0001555, whisper_loss=0.1055, over 15637.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01031, ecapa_loss=0.0001402, whisper_loss=0.0897, over 3710272.76 frames. ], batch size: 59, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:07:23,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=4453990.0, ans=10.0 2024-08-19 17:08:06,023 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 17:08:19,100 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.286e+01 2.525e+01 2.905e+01 4.318e+01, threshold=5.049e+01, percent-clipped=0.0 2024-08-19 17:08:26,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4454290.0, ans=0.125 2024-08-19 17:08:32,939 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 850, loss[loss=0.08029, beats_loss=0.01037, ecapa_loss=9.383e-05, whisper_loss=0.06898, over 15061.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01031, ecapa_loss=0.0001403, whisper_loss=0.08928, over 3766795.57 frames. ], batch size: 57, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:08:36,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4454390.0, ans=0.125 2024-08-19 17:08:42,791 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 14 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-19 17:09:27,803 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 33 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-19 17:09:31,759 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2024-08-19 17:09:59,474 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 900, loss[loss=0.1183, beats_loss=0.01014, ecapa_loss=0.0001125, whisper_loss=0.107, over 21208.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.0102, ecapa_loss=0.0001408, whisper_loss=0.08907, over 3735998.71 frames. ], batch size: 80, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:10:16,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4454990.0, ans=0.0 2024-08-19 17:10:26,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4454990.0, ans=0.0 2024-08-19 17:10:30,313 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 27 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 17:10:41,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.05 vs. limit=15.0 2024-08-19 17:10:46,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4455090.0, ans=0.125 2024-08-19 17:10:48,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4455090.0, ans=0.1 2024-08-19 17:10:51,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4455190.0, ans=0.125 2024-08-19 17:10:53,151 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 30 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-19 17:11:06,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4455290.0, ans=0.0 2024-08-19 17:11:07,869 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 15 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 17:11:11,412 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.270e+01 2.580e+01 3.222e+01 2.488e+02, threshold=5.161e+01, percent-clipped=3.0 2024-08-19 17:11:19,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4455290.0, ans=0.0 2024-08-19 17:11:24,785 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 950, loss[loss=0.1073, beats_loss=0.009361, ecapa_loss=0.0001246, whisper_loss=0.09666, over 19540.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01016, ecapa_loss=0.0001401, whisper_loss=0.08968, over 3757498.93 frames. ], batch size: 73, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:11:42,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4455490.0, ans=0.1 2024-08-19 17:11:43,300 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 25 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-19 17:11:57,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4455590.0, ans=0.0 2024-08-19 17:12:01,531 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 17:12:05,382 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.69 vs. limit=15.0 2024-08-19 17:12:11,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4455590.0, ans=0.2 2024-08-19 17:12:13,421 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 17:12:31,116 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2024-08-19 17:12:32,653 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.94 vs. limit=12.0 2024-08-19 17:12:37,271 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 17 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-19 17:12:42,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4455790.0, ans=0.125 2024-08-19 17:12:49,082 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 1000, loss[loss=0.08753, beats_loss=0.01137, ecapa_loss=0.0001596, whisper_loss=0.07457, over 15624.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01017, ecapa_loss=0.0001406, whisper_loss=0.0886, over 3737940.49 frames. ], batch size: 66, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:12:52,698 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 17 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-19 17:13:26,851 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 15 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 17:13:34,546 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.13 vs. limit=15.0 2024-08-19 17:13:37,251 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 17:13:51,879 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 21 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-19 17:13:59,469 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.231e+01 2.578e+01 2.915e+01 8.708e+01, threshold=5.156e+01, percent-clipped=1.0 2024-08-19 17:14:12,621 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 1050, loss[loss=0.1176, beats_loss=0.01166, ecapa_loss=0.0001205, whisper_loss=0.1047, over 23032.00 frames. ], tot_loss[loss=0.09949, beats_loss=0.01031, ecapa_loss=0.000139, whisper_loss=0.08779, over 3737763.45 frames. ], batch size: 88, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:14:15,577 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.50 vs. limit=15.0 2024-08-19 17:14:31,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4456490.0, ans=0.125 2024-08-19 17:14:53,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4456590.0, ans=0.0 2024-08-19 17:15:05,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4456690.0, ans=0.1 2024-08-19 17:15:16,120 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 17:15:23,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4456790.0, ans=0.0 2024-08-19 17:15:25,662 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.59 vs. limit=10.0 2024-08-19 17:15:28,512 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-19 17:15:37,030 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 1100, loss[loss=0.0983, beats_loss=0.01133, ecapa_loss=0.0001409, whisper_loss=0.08557, over 21929.00 frames. ], tot_loss[loss=0.09974, beats_loss=0.01026, ecapa_loss=0.0001388, whisper_loss=0.08809, over 3741820.48 frames. ], batch size: 89, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:15:43,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4456890.0, ans=0.125 2024-08-19 17:15:47,912 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 25 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-19 17:16:12,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4457090.0, ans=0.0 2024-08-19 17:16:15,543 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 32 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-19 17:16:15,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4457090.0, ans=0.125 2024-08-19 17:16:42,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4457190.0, ans=0.125 2024-08-19 17:16:48,541 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.222e+01 2.508e+01 2.804e+01 3.305e+02, threshold=5.015e+01, percent-clipped=2.0 2024-08-19 17:16:50,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4457290.0, ans=0.125 2024-08-19 17:16:52,178 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 25 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-19 17:16:56,214 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.64 vs. limit=22.5 2024-08-19 17:17:00,999 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.45 vs. limit=15.0 2024-08-19 17:17:01,925 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 1150, loss[loss=0.08894, beats_loss=0.009731, ecapa_loss=0.0001588, whisper_loss=0.07762, over 20461.00 frames. ], tot_loss[loss=0.09964, beats_loss=0.01034, ecapa_loss=0.0001379, whisper_loss=0.08793, over 3734811.89 frames. ], batch size: 84, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:17:13,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4457390.0, ans=0.125 2024-08-19 17:17:16,943 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 30 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 17:17:17,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4457390.0, ans=0.125 2024-08-19 17:17:42,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4457590.0, ans=0.1 2024-08-19 17:17:43,116 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 34 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 17:17:54,133 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.20 vs. limit=22.5 2024-08-19 17:18:26,397 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 1200, loss[loss=0.06074, beats_loss=0.01475, ecapa_loss=0.0001169, whisper_loss=0.04482, over 17056.00 frames. ], tot_loss[loss=0.09958, beats_loss=0.01037, ecapa_loss=0.0001376, whisper_loss=0.08783, over 3734975.55 frames. ], batch size: 70, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:18:35,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4457890.0, ans=0.04949747468305833 2024-08-19 17:18:47,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4457990.0, ans=0.125 2024-08-19 17:19:18,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4458190.0, ans=0.125 2024-08-19 17:19:23,682 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.90 vs. limit=22.5 2024-08-19 17:19:33,142 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 25 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-19 17:19:36,582 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.677e+01 2.243e+01 2.469e+01 2.791e+01 3.736e+01, threshold=4.938e+01, percent-clipped=0.0 2024-08-19 17:19:38,765 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 30 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 17:19:50,490 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 1250, loss[loss=0.08341, beats_loss=0.01148, ecapa_loss=0.0001347, whisper_loss=0.07058, over 13446.00 frames. ], tot_loss[loss=0.09951, beats_loss=0.01041, ecapa_loss=0.0001383, whisper_loss=0.08772, over 3719790.88 frames. ], batch size: 54, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:20:02,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4458390.0, ans=0.2 2024-08-19 17:20:09,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4458490.0, ans=0.1 2024-08-19 17:20:15,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4458490.0, ans=0.2 2024-08-19 17:20:39,880 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 17:20:46,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4458690.0, ans=0.125 2024-08-19 17:20:48,914 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.74 vs. limit=15.0 2024-08-19 17:21:12,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4458790.0, ans=0.5 2024-08-19 17:21:15,037 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 1300, loss[loss=0.1106, beats_loss=0.009707, ecapa_loss=0.000155, whisper_loss=0.09938, over 19944.00 frames. ], tot_loss[loss=0.09945, beats_loss=0.01044, ecapa_loss=0.0001381, whisper_loss=0.08763, over 3700739.40 frames. ], batch size: 82, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:21:19,614 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 18 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-19 17:21:21,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4458890.0, ans=0.1 2024-08-19 17:21:41,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4458990.0, ans=0.05 2024-08-19 17:21:50,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4459090.0, ans=0.1 2024-08-19 17:22:18,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4459190.0, ans=0.025 2024-08-19 17:22:23,940 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.156e+01 2.325e+01 2.640e+01 4.207e+01, threshold=4.651e+01, percent-clipped=0.0 2024-08-19 17:22:24,167 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 17:22:31,869 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=15.0 2024-08-19 17:22:37,394 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 1350, loss[loss=0.1151, beats_loss=0.007673, ecapa_loss=0.0001475, whisper_loss=0.106, over 14187.00 frames. ], tot_loss[loss=0.09986, beats_loss=0.01046, ecapa_loss=0.0001369, whisper_loss=0.08803, over 3722377.16 frames. ], batch size: 57, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:22:52,995 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.79 vs. limit=22.5 2024-08-19 17:22:59,474 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 15 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-19 17:23:14,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4459590.0, ans=0.125 2024-08-19 17:23:17,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4459590.0, ans=0.125 2024-08-19 17:23:43,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4459790.0, ans=0.0 2024-08-19 17:23:46,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4459790.0, ans=0.125 2024-08-19 17:23:47,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4459790.0, ans=0.125 2024-08-19 17:23:59,856 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 1400, loss[loss=0.09495, beats_loss=0.0122, ecapa_loss=0.0001369, whisper_loss=0.08138, over 22313.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01043, ecapa_loss=0.0001365, whisper_loss=0.08828, over 3726501.25 frames. ], batch size: 89, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:24:07,844 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 26 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-19 17:24:21,065 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 22 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 17:24:22,302 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 34 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-19 17:24:38,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4460090.0, ans=0.1 2024-08-19 17:24:38,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.29 vs. limit=15.0 2024-08-19 17:24:39,391 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-19 17:24:47,998 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 28 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-19 17:25:07,907 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 17:25:11,242 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.246e+01 2.446e+01 2.816e+01 8.915e+01, threshold=4.891e+01, percent-clipped=2.0 2024-08-19 17:25:14,677 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 28 from LS+wenet, 32 from Vox, 30 fro AS 2024-08-19 17:25:24,145 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 1450, loss[loss=0.1115, beats_loss=0.007935, ecapa_loss=0.0001528, whisper_loss=0.102, over 14109.00 frames. ], tot_loss[loss=0.09992, beats_loss=0.01034, ecapa_loss=0.0001373, whisper_loss=0.0882, over 3769385.47 frames. ], batch size: 57, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:25:38,027 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 33 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-19 17:25:47,885 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 17:25:48,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4460490.0, ans=0.125 2024-08-19 17:26:00,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4460590.0, ans=0.125 2024-08-19 17:26:03,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4460590.0, ans=0.2 2024-08-19 17:26:08,249 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 26 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 17:26:23,358 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 17:26:44,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4460790.0, ans=0.1 2024-08-19 17:26:46,817 WARNING [optim.py:496] (0/4) Scaling gradients by 0.051883164793252945, model_norm_threshold=48.91460418701172 2024-08-19 17:26:47,246 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.482e+04, grad_sumsq=2.578e+04, orig_rms_sq=3.290e+00 2024-08-19 17:26:47,455 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 13 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 17:26:51,991 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 10 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-19 17:26:53,312 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 1500, loss[loss=0.07222, beats_loss=0.01104, ecapa_loss=0.0001599, whisper_loss=0.05959, over 12218.00 frames. ], tot_loss[loss=0.09941, beats_loss=0.01035, ecapa_loss=0.0001367, whisper_loss=0.08769, over 3725460.24 frames. ], batch size: 50, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:27:00,853 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.09 vs. limit=10.0 2024-08-19 17:27:22,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4460990.0, ans=0.125 2024-08-19 17:27:45,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4461190.0, ans=0.1 2024-08-19 17:28:07,886 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 13 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-19 17:28:09,538 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.242e+01 2.515e+01 2.868e+01 9.428e+02, threshold=5.031e+01, percent-clipped=1.0 2024-08-19 17:28:22,810 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 1550, loss[loss=0.1075, beats_loss=0.01079, ecapa_loss=0.0001137, whisper_loss=0.09554, over 20672.00 frames. ], tot_loss[loss=0.09906, beats_loss=0.01031, ecapa_loss=0.0001374, whisper_loss=0.08738, over 3714541.47 frames. ], batch size: 77, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:28:23,745 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.54 vs. limit=22.5 2024-08-19 17:28:36,757 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.57 vs. limit=12.0 2024-08-19 17:28:41,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4461490.0, ans=0.125 2024-08-19 17:28:53,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4461490.0, ans=0.0 2024-08-19 17:29:05,868 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.30 vs. limit=15.0 2024-08-19 17:29:10,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4461590.0, ans=0.125 2024-08-19 17:29:36,269 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 38 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-19 17:29:36,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4461790.0, ans=0.2 2024-08-19 17:29:38,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4461790.0, ans=0.125 2024-08-19 17:29:41,690 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 20 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-19 17:29:50,173 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 1600, loss[loss=0.1021, beats_loss=0.009419, ecapa_loss=0.0001381, whisper_loss=0.09134, over 15596.00 frames. ], tot_loss[loss=0.09968, beats_loss=0.01028, ecapa_loss=0.0001375, whisper_loss=0.08803, over 3726601.69 frames. ], batch size: 61, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:29:57,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4461890.0, ans=0.0 2024-08-19 17:29:59,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4461890.0, ans=0.125 2024-08-19 17:30:04,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4461890.0, ans=0.125 2024-08-19 17:30:09,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4461990.0, ans=0.125 2024-08-19 17:30:18,520 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.64 vs. limit=15.0 2024-08-19 17:31:00,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4462290.0, ans=0.125 2024-08-19 17:31:03,517 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.591e+01 2.243e+01 2.444e+01 2.645e+01 4.310e+01, threshold=4.888e+01, percent-clipped=0.0 2024-08-19 17:31:06,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.89 vs. limit=8.0 2024-08-19 17:31:11,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=4462290.0, ans=0.95 2024-08-19 17:31:14,453 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 10 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-19 17:31:17,908 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 1650, loss[loss=0.1073, beats_loss=0.008488, ecapa_loss=0.0001246, whisper_loss=0.09756, over 17567.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01026, ecapa_loss=0.0001378, whisper_loss=0.08862, over 3750221.79 frames. ], batch size: 67, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:32:14,879 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2024-08-19 17:32:29,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4462790.0, ans=0.09899494936611666 2024-08-19 17:32:31,443 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.76 vs. limit=12.0 2024-08-19 17:32:41,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4462890.0, ans=0.125 2024-08-19 17:32:43,017 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 1700, loss[loss=0.1172, beats_loss=0.008514, ecapa_loss=0.0001283, whisper_loss=0.1074, over 14649.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01017, ecapa_loss=0.000138, whisper_loss=0.08967, over 3749905.63 frames. ], batch size: 53, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:32:46,828 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 33 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-19 17:33:11,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4462990.0, ans=0.125 2024-08-19 17:33:12,919 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 33 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 17:33:24,410 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 26 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-19 17:33:31,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4463090.0, ans=0.125 2024-08-19 17:33:54,681 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.265e+01 2.450e+01 2.804e+01 4.783e+01, threshold=4.900e+01, percent-clipped=0.0 2024-08-19 17:34:06,637 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 22 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-19 17:34:08,171 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 1750, loss[loss=0.09115, beats_loss=0.01226, ecapa_loss=0.000133, whisper_loss=0.07757, over 21306.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01018, ecapa_loss=0.0001372, whisper_loss=0.08996, over 3766038.02 frames. ], batch size: 87, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:34:33,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4463490.0, ans=0.125 2024-08-19 17:34:48,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4463590.0, ans=0.125 2024-08-19 17:35:03,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4463690.0, ans=0.125 2024-08-19 17:35:24,407 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 24 from LS+wenet, 17 from Vox, 16 fro AS 2024-08-19 17:35:24,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4463790.0, ans=0.0 2024-08-19 17:35:31,775 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 1800, loss[loss=0.08672, beats_loss=0.01094, ecapa_loss=0.0001012, whisper_loss=0.07477, over 17763.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01017, ecapa_loss=0.0001388, whisper_loss=0.08935, over 3763920.92 frames. ], batch size: 69, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:35:49,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4463990.0, ans=0.2 2024-08-19 17:35:51,009 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 19 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-19 17:36:02,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4464090.0, ans=0.07 2024-08-19 17:36:04,564 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 23 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-19 17:36:09,556 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 22 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-19 17:36:16,381 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 17 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-19 17:36:18,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4464090.0, ans=0.2 2024-08-19 17:36:21,085 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 20 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 17:36:21,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4464190.0, ans=0.2 2024-08-19 17:36:23,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4464190.0, ans=0.0 2024-08-19 17:36:34,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4464190.0, ans=0.125 2024-08-19 17:36:34,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4464190.0, ans=0.0 2024-08-19 17:36:40,214 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.688e+01 2.242e+01 2.530e+01 2.809e+01 4.955e+01, threshold=5.060e+01, percent-clipped=1.0 2024-08-19 17:36:53,834 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 1850, loss[loss=0.09162, beats_loss=0.009627, ecapa_loss=0.0001469, whisper_loss=0.08052, over 22045.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01026, ecapa_loss=0.0001376, whisper_loss=0.08886, over 3767625.93 frames. ], batch size: 89, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:37:05,517 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 21 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 17:37:07,002 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 26 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-19 17:37:17,026 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 27 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-19 17:37:37,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4464590.0, ans=0.1 2024-08-19 17:37:37,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4464590.0, ans=0.1 2024-08-19 17:37:39,841 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.28 vs. limit=15.0 2024-08-19 17:38:17,774 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 1900, loss[loss=0.102, beats_loss=0.009461, ecapa_loss=0.0001538, whisper_loss=0.09105, over 17253.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01021, ecapa_loss=0.0001371, whisper_loss=0.08974, over 3749331.14 frames. ], batch size: 68, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:38:32,021 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 20 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-19 17:38:49,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4465090.0, ans=0.1 2024-08-19 17:39:07,387 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.66 vs. limit=15.0 2024-08-19 17:39:12,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4465190.0, ans=0.125 2024-08-19 17:39:27,460 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.02 vs. limit=15.0 2024-08-19 17:39:28,275 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.273e+01 2.509e+01 2.742e+01 5.984e+01, threshold=5.017e+01, percent-clipped=1.0 2024-08-19 17:39:41,894 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 1950, loss[loss=0.1016, beats_loss=0.01168, ecapa_loss=0.0001112, whisper_loss=0.08885, over 18621.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01028, ecapa_loss=0.000135, whisper_loss=0.08996, over 3720673.77 frames. ], batch size: 72, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:39:59,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4465490.0, ans=0.125 2024-08-19 17:40:17,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4465590.0, ans=0.125 2024-08-19 17:40:29,361 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 19 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-19 17:40:37,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4465690.0, ans=0.2 2024-08-19 17:41:07,330 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 2000, loss[loss=0.1093, beats_loss=0.009412, ecapa_loss=0.0001499, whisper_loss=0.09838, over 21676.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01033, ecapa_loss=0.0001346, whisper_loss=0.08934, over 3740506.68 frames. ], batch size: 90, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:41:13,178 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 22 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 17:41:19,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4465890.0, ans=0.04949747468305833 2024-08-19 17:41:26,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4465990.0, ans=0.125 2024-08-19 17:41:55,016 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 16 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-19 17:41:56,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4466190.0, ans=0.125 2024-08-19 17:41:59,395 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.33 vs. limit=10.0 2024-08-19 17:42:11,470 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.68 vs. limit=15.0 2024-08-19 17:42:16,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4466290.0, ans=0.125 2024-08-19 17:42:18,680 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.386e+01 2.592e+01 2.882e+01 2.246e+02, threshold=5.185e+01, percent-clipped=4.0 2024-08-19 17:42:20,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4466290.0, ans=0.125 2024-08-19 17:42:31,611 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 2050, loss[loss=0.118, beats_loss=0.01006, ecapa_loss=0.0001256, whisper_loss=0.1067, over 24332.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0103, ecapa_loss=0.0001352, whisper_loss=0.08949, over 3730206.16 frames. ], batch size: 94, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:42:33,552 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-19 17:42:43,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4466390.0, ans=0.125 2024-08-19 17:42:46,333 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 24 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-19 17:43:10,672 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 22 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-19 17:43:26,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4466690.0, ans=0.025 2024-08-19 17:43:36,763 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 30 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-19 17:43:40,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4466790.0, ans=0.0 2024-08-19 17:43:40,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4466790.0, ans=0.125 2024-08-19 17:43:58,912 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 2100, loss[loss=0.08433, beats_loss=0.01162, ecapa_loss=0.0001439, whisper_loss=0.07127, over 16806.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01036, ecapa_loss=0.0001344, whisper_loss=0.08878, over 3741832.97 frames. ], batch size: 68, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:44:13,106 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 19 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-19 17:44:26,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4466990.0, ans=0.1 2024-08-19 17:44:28,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4466990.0, ans=0.125 2024-08-19 17:44:28,885 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.29 vs. limit=6.0 2024-08-19 17:44:33,813 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.87 vs. limit=15.0 2024-08-19 17:44:43,364 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 24 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-19 17:44:43,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4467090.0, ans=0.0 2024-08-19 17:44:58,780 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 21 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 17:45:00,698 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 17 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-19 17:45:00,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4467190.0, ans=0.0 2024-08-19 17:45:04,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4467190.0, ans=0.125 2024-08-19 17:45:09,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4467290.0, ans=0.2 2024-08-19 17:45:10,670 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.698e+01 2.301e+01 2.618e+01 2.880e+01 6.452e+01, threshold=5.236e+01, percent-clipped=1.0 2024-08-19 17:45:17,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4467290.0, ans=0.125 2024-08-19 17:45:24,068 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 2150, loss[loss=0.1178, beats_loss=0.009597, ecapa_loss=0.0001344, whisper_loss=0.1068, over 19628.00 frames. ], tot_loss[loss=0.09971, beats_loss=0.0104, ecapa_loss=0.0001342, whisper_loss=0.08796, over 3689334.59 frames. ], batch size: 75, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:45:36,453 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 25 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-19 17:45:49,075 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 25 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 17:45:58,933 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 21 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-19 17:46:00,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4467590.0, ans=0.125 2024-08-19 17:46:02,745 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.35 vs. limit=15.0 2024-08-19 17:46:07,995 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2024-08-19 17:46:09,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4467590.0, ans=0.025 2024-08-19 17:46:26,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4467690.0, ans=0.0 2024-08-19 17:46:33,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4467790.0, ans=0.1 2024-08-19 17:46:51,887 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 2200, loss[loss=0.1029, beats_loss=0.008249, ecapa_loss=0.0001318, whisper_loss=0.09335, over 16772.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01048, ecapa_loss=0.0001341, whisper_loss=0.0885, over 3708753.94 frames. ], batch size: 63, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:46:56,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4467890.0, ans=0.125 2024-08-19 17:47:00,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4467890.0, ans=0.04949747468305833 2024-08-19 17:47:05,502 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.77 vs. limit=15.0 2024-08-19 17:47:06,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4467890.0, ans=0.125 2024-08-19 17:47:25,831 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2024-08-19 17:47:27,558 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.76 vs. limit=15.0 2024-08-19 17:47:32,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4468090.0, ans=0.1 2024-08-19 17:47:35,386 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 31 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-19 17:47:47,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4468190.0, ans=0.125 2024-08-19 17:47:57,476 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 15 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-19 17:48:04,920 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.319e+01 2.611e+01 2.846e+01 3.358e+02, threshold=5.223e+01, percent-clipped=1.0 2024-08-19 17:48:14,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4468290.0, ans=0.2 2024-08-19 17:48:17,616 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 2250, loss[loss=0.07687, beats_loss=0.01027, ecapa_loss=0.000117, whisper_loss=0.06543, over 14933.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01055, ecapa_loss=0.0001338, whisper_loss=0.08908, over 3713450.14 frames. ], batch size: 57, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:48:21,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4468390.0, ans=0.125 2024-08-19 17:48:43,096 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 19 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-19 17:48:43,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4468490.0, ans=0.0 2024-08-19 17:49:05,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4468590.0, ans=0.5 2024-08-19 17:49:19,385 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 17:49:29,966 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.48 vs. limit=10.0 2024-08-19 17:49:32,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4468790.0, ans=0.0 2024-08-19 17:49:42,743 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 2300, loss[loss=0.1036, beats_loss=0.01123, ecapa_loss=0.0001425, whisper_loss=0.09096, over 19563.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01054, ecapa_loss=0.0001352, whisper_loss=0.08925, over 3717739.64 frames. ], batch size: 79, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 17:50:01,975 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 18 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-19 17:50:08,624 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 28 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 17:50:14,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4468990.0, ans=0.0 2024-08-19 17:50:54,328 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.331e+01 2.598e+01 2.979e+01 4.563e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-19 17:50:55,114 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.97 vs. limit=6.0 2024-08-19 17:51:07,667 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 2350, loss[loss=0.101, beats_loss=0.0114, ecapa_loss=0.0001497, whisper_loss=0.08814, over 22098.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01056, ecapa_loss=0.0001368, whisper_loss=0.09017, over 3761094.49 frames. ], batch size: 94, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 17:51:15,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4469390.0, ans=0.125 2024-08-19 17:51:33,429 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 31 from LS+wenet, 33 from Vox, 26 fro AS 2024-08-19 17:51:42,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4469590.0, ans=0.125 2024-08-19 17:51:57,682 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 30 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-19 17:52:06,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4469690.0, ans=0.125 2024-08-19 17:52:22,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=4469790.0, ans=10.0 2024-08-19 17:52:33,339 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 2400, loss[loss=0.105, beats_loss=0.01335, ecapa_loss=0.0001311, whisper_loss=0.09031, over 22582.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01047, ecapa_loss=0.0001367, whisper_loss=0.09052, over 3738967.17 frames. ], batch size: 93, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 17:52:51,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4469990.0, ans=0.125 2024-08-19 17:52:53,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4469990.0, ans=0.0 2024-08-19 17:53:15,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=4470090.0, ans=0.05 2024-08-19 17:53:21,172 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.79 vs. limit=12.0 2024-08-19 17:53:31,485 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 37 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 17:53:37,040 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=15.0 2024-08-19 17:53:46,704 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.352e+01 2.498e+01 2.702e+01 6.582e+01, threshold=4.996e+01, percent-clipped=0.0 2024-08-19 17:53:47,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4470290.0, ans=0.125 2024-08-19 17:54:01,060 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 2450, loss[loss=0.09529, beats_loss=0.01142, ecapa_loss=0.0001477, whisper_loss=0.0824, over 21935.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01039, ecapa_loss=0.0001376, whisper_loss=0.0911, over 3754137.59 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 17:54:48,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4470590.0, ans=0.0 2024-08-19 17:55:25,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4470790.0, ans=0.125 2024-08-19 17:55:25,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4470790.0, ans=0.125 2024-08-19 17:55:26,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=4470790.0, ans=0.2 2024-08-19 17:55:28,490 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 2500, loss[loss=0.1121, beats_loss=0.007516, ecapa_loss=0.0001313, whisper_loss=0.1032, over 15825.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01042, ecapa_loss=0.000138, whisper_loss=0.09023, over 3748698.06 frames. ], batch size: 61, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 17:55:30,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4470890.0, ans=0.0 2024-08-19 17:55:46,340 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 15 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-19 17:56:08,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4471090.0, ans=0.125 2024-08-19 17:56:13,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4471090.0, ans=0.125 2024-08-19 17:56:13,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4471090.0, ans=0.125 2024-08-19 17:56:13,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.86 vs. limit=15.0 2024-08-19 17:56:24,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4471190.0, ans=0.125 2024-08-19 17:56:40,470 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.734e+01 2.297e+01 2.536e+01 2.855e+01 4.497e+01, threshold=5.072e+01, percent-clipped=1.0 2024-08-19 17:56:42,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=4471290.0, ans=0.02 2024-08-19 17:56:46,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4471290.0, ans=0.2 2024-08-19 17:56:48,489 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.11 vs. limit=15.0 2024-08-19 17:56:54,078 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 2550, loss[loss=0.0954, beats_loss=0.00957, ecapa_loss=0.000133, whisper_loss=0.0845, over 16156.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01046, ecapa_loss=0.0001375, whisper_loss=0.09015, over 3751053.80 frames. ], batch size: 65, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 17:57:18,727 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.33 vs. limit=15.0 2024-08-19 17:57:35,165 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 21 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-19 17:57:43,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2024-08-19 17:57:56,659 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.21 vs. limit=12.0 2024-08-19 17:58:04,447 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 16 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-19 17:58:06,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4471790.0, ans=0.0 2024-08-19 17:58:19,397 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 2600, loss[loss=0.1126, beats_loss=0.007686, ecapa_loss=0.0001214, whisper_loss=0.1037, over 16489.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01048, ecapa_loss=0.0001377, whisper_loss=0.09038, over 3782721.66 frames. ], batch size: 60, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 17:58:21,672 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 20 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-19 17:58:25,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4471890.0, ans=0.125 2024-08-19 17:58:28,242 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 17 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 17:58:33,999 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.595e+01 2024-08-19 17:58:53,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4472090.0, ans=0.1 2024-08-19 17:58:59,989 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.18 vs. limit=6.0 2024-08-19 17:59:03,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4472090.0, ans=0.2 2024-08-19 17:59:11,885 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 17 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-19 17:59:12,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4472190.0, ans=0.2 2024-08-19 17:59:14,039 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 27 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-19 17:59:17,175 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 16 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-19 17:59:33,104 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.692e+01 2.323e+01 2.520e+01 2.771e+01 4.731e+01, threshold=5.040e+01, percent-clipped=0.0 2024-08-19 17:59:38,940 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 23 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 17:59:47,283 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 2650, loss[loss=0.1098, beats_loss=0.009725, ecapa_loss=0.0001193, whisper_loss=0.09889, over 19714.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01047, ecapa_loss=0.0001382, whisper_loss=0.09016, over 3802118.28 frames. ], batch size: 73, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 17:59:48,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=4472390.0, ans=22.5 2024-08-19 17:59:57,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4472390.0, ans=0.2 2024-08-19 18:00:31,893 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 18 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-19 18:00:44,671 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 26 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-19 18:00:58,664 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 30 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-19 18:01:18,033 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 2700, loss[loss=0.09786, beats_loss=0.01185, ecapa_loss=0.0001351, whisper_loss=0.08465, over 19646.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01043, ecapa_loss=0.0001385, whisper_loss=0.0899, over 3818804.54 frames. ], batch size: 80, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:01:50,064 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 19 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-19 18:01:55,673 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 18:02:18,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4473190.0, ans=0.125 2024-08-19 18:02:18,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4473190.0, ans=0.125 2024-08-19 18:02:26,869 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 28 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 18:02:31,188 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.406e+01 2.691e+01 2.981e+01 2.904e+02, threshold=5.383e+01, percent-clipped=2.0 2024-08-19 18:02:33,852 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 12 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 18:02:45,274 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 2750, loss[loss=0.09891, beats_loss=0.009997, ecapa_loss=0.0001358, whisper_loss=0.08756, over 19215.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01053, ecapa_loss=0.0001377, whisper_loss=0.08959, over 3848725.83 frames. ], batch size: 75, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:02:50,939 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 28 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-19 18:03:02,352 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 28 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 18:03:04,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4473490.0, ans=0.95 2024-08-19 18:03:04,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4473490.0, ans=0.0 2024-08-19 18:03:06,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4473490.0, ans=0.1 2024-08-19 18:03:18,411 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 33 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-19 18:03:42,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4473690.0, ans=0.125 2024-08-19 18:03:55,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4473790.0, ans=0.125 2024-08-19 18:03:56,794 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 27 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-19 18:04:13,812 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 2800, loss[loss=0.08456, beats_loss=0.01198, ecapa_loss=0.0001316, whisper_loss=0.07126, over 17694.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01048, ecapa_loss=0.0001373, whisper_loss=0.09003, over 3851357.14 frames. ], batch size: 73, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:04:27,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4473890.0, ans=0.125 2024-08-19 18:04:50,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=4474090.0, ans=0.1 2024-08-19 18:05:02,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4474090.0, ans=0.0 2024-08-19 18:05:09,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4474190.0, ans=0.125 2024-08-19 18:05:21,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4474190.0, ans=0.2 2024-08-19 18:05:21,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4474190.0, ans=0.1 2024-08-19 18:05:28,465 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.25 vs. limit=15.0 2024-08-19 18:05:28,652 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.228e+01 2.485e+01 2.852e+01 2.973e+02, threshold=4.969e+01, percent-clipped=1.0 2024-08-19 18:05:37,386 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 26 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-19 18:05:39,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4474290.0, ans=0.1 2024-08-19 18:05:43,144 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 2850, loss[loss=0.09835, beats_loss=0.01117, ecapa_loss=0.0001253, whisper_loss=0.08593, over 23182.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01049, ecapa_loss=0.0001371, whisper_loss=0.08964, over 3824597.75 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:05:45,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4474390.0, ans=0.0 2024-08-19 18:05:50,801 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.11 vs. limit=22.5 2024-08-19 18:05:52,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4474390.0, ans=0.125 2024-08-19 18:05:58,957 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 29 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-19 18:06:17,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4474590.0, ans=0.125 2024-08-19 18:06:31,394 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 25 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-19 18:06:36,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4474690.0, ans=0.0 2024-08-19 18:07:01,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4474790.0, ans=0.125 2024-08-19 18:07:01,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4474790.0, ans=0.0 2024-08-19 18:07:11,016 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 2900, loss[loss=0.08948, beats_loss=0.01017, ecapa_loss=0.0001267, whisper_loss=0.07804, over 22642.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01045, ecapa_loss=0.0001381, whisper_loss=0.08959, over 3832349.45 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:07:25,614 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 18:08:27,488 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.549e+01 2.227e+01 2.443e+01 2.748e+01 5.602e+01, threshold=4.887e+01, percent-clipped=1.0 2024-08-19 18:08:41,769 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 2950, loss[loss=0.1073, beats_loss=0.008977, ecapa_loss=0.0001763, whisper_loss=0.0966, over 17122.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01048, ecapa_loss=0.0001385, whisper_loss=0.08913, over 3839537.55 frames. ], batch size: 68, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:08:56,143 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.20 vs. limit=15.0 2024-08-19 18:09:10,674 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2024-08-19 18:09:37,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4475690.0, ans=0.125 2024-08-19 18:10:05,204 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 23 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 18:10:12,309 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 3000, loss[loss=0.1057, beats_loss=0.01028, ecapa_loss=0.0001092, whisper_loss=0.09434, over 23603.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01045, ecapa_loss=0.0001384, whisper_loss=0.08965, over 3873804.89 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:10:12,310 INFO [train_multi_KD3.py:1140] (0/4) Computing validation loss 2024-08-19 18:10:48,079 INFO [train_multi_KD3.py:1150] (0/4) Epoch 31, validation on ASR_libri: loss=0.2543, beats_loss=0, ecapa_loss=0.0005052, whisper_loss=0.2492, over 931116.00 frames. 2024-08-19 18:11:09,459 INFO [train_multi_KD3.py:1150] (0/4) Epoch 31, validation on SV_voxceleb1: loss=0.003946, beats_loss=0, ecapa_loss=0.0003946, whisper_loss=0, over 944235.00 frames. 2024-08-19 18:12:27,420 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.6397, 2.0404, 2.1887, 1.6723, 1.7467, 2.4555, 2.9268, 1.8862], device='cuda:0') 2024-08-19 18:12:49,006 INFO [train_multi_KD3.py:1150] (0/4) Epoch 31, validation on AT_audioset: loss=0.02308, beats_loss=0.02308, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 18:12:49,010 INFO [train_multi_KD3.py:1156] (0/4) Maximum memory allocated so far is 32775MB 2024-08-19 18:13:00,513 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 24 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-19 18:13:16,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4475990.0, ans=0.07 2024-08-19 18:13:31,027 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.121e-01 2024-08-19 18:13:32,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4476090.0, ans=0.0 2024-08-19 18:13:34,462 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 30 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-19 18:13:45,282 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 18:14:01,380 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 31 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-19 18:14:02,442 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.325e+01 2.611e+01 2.866e+01 5.886e+01, threshold=5.222e+01, percent-clipped=1.0 2024-08-19 18:14:05,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4476290.0, ans=0.125 2024-08-19 18:14:12,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4476290.0, ans=0.125 2024-08-19 18:14:16,850 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 3050, loss[loss=0.09357, beats_loss=0.01075, ecapa_loss=0.0001619, whisper_loss=0.0812, over 18159.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0104, ecapa_loss=0.0001393, whisper_loss=0.09024, over 3882769.91 frames. ], batch size: 76, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:14:17,069 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 29 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-19 18:14:17,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4476390.0, ans=0.0 2024-08-19 18:14:54,512 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 12 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-19 18:15:09,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4476590.0, ans=0.125 2024-08-19 18:15:19,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4476690.0, ans=0.125 2024-08-19 18:15:36,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4476790.0, ans=0.0 2024-08-19 18:15:38,244 INFO [train_multi_KD3.py:845] (0/4) A total of 49 cuts. 13 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-19 18:15:40,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4476790.0, ans=0.125 2024-08-19 18:15:47,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4476790.0, ans=0.125 2024-08-19 18:15:50,661 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 3100, loss[loss=0.1114, beats_loss=0.01061, ecapa_loss=0.0001652, whisper_loss=0.09917, over 22935.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01038, ecapa_loss=0.0001408, whisper_loss=0.09044, over 3862998.27 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:15:57,103 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 21 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-19 18:15:59,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4476890.0, ans=0.125 2024-08-19 18:16:01,023 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-19 18:16:17,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4476990.0, ans=0.1 2024-08-19 18:16:18,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4476990.0, ans=0.1 2024-08-19 18:16:18,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4476990.0, ans=0.125 2024-08-19 18:16:26,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4477090.0, ans=0.0 2024-08-19 18:16:45,980 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 29 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-19 18:16:56,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4477190.0, ans=0.125 2024-08-19 18:17:07,072 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.384e+01 2.601e+01 2.875e+01 4.295e+01, threshold=5.202e+01, percent-clipped=0.0 2024-08-19 18:17:22,604 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 3150, loss[loss=0.09809, beats_loss=0.01127, ecapa_loss=0.0001521, whisper_loss=0.0853, over 19101.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01046, ecapa_loss=0.0001414, whisper_loss=0.09048, over 3856783.74 frames. ], batch size: 78, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:17:52,344 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.99 vs. limit=22.5 2024-08-19 18:18:04,480 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 28 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-19 18:18:05,069 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.55 vs. limit=15.0 2024-08-19 18:18:13,639 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.472e+00 2024-08-19 18:18:17,218 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 18:18:27,462 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 19 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-19 18:18:36,566 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 18 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-19 18:18:49,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4477790.0, ans=0.125 2024-08-19 18:18:52,527 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 3200, loss[loss=0.1, beats_loss=0.008383, ecapa_loss=0.0001697, whisper_loss=0.08995, over 21174.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01041, ecapa_loss=0.0001414, whisper_loss=0.09125, over 3838413.68 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:18:57,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4477890.0, ans=0.1 2024-08-19 18:19:28,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4478090.0, ans=0.1 2024-08-19 18:19:49,012 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 18 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-19 18:20:07,953 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.317e+01 2.495e+01 2.835e+01 3.728e+01, threshold=4.990e+01, percent-clipped=0.0 2024-08-19 18:20:15,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4478290.0, ans=0.5 2024-08-19 18:20:16,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4478290.0, ans=0.125 2024-08-19 18:20:21,888 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 3250, loss[loss=0.09164, beats_loss=0.01113, ecapa_loss=0.0001709, whisper_loss=0.0788, over 16128.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01038, ecapa_loss=0.0001413, whisper_loss=0.09124, over 3808634.83 frames. ], batch size: 67, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:20:42,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4478490.0, ans=0.035 2024-08-19 18:21:14,445 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 21 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-19 18:21:21,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4478690.0, ans=0.1 2024-08-19 18:21:21,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4478690.0, ans=0.1 2024-08-19 18:21:40,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4478790.0, ans=0.125 2024-08-19 18:21:41,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4478790.0, ans=0.0 2024-08-19 18:21:45,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4478790.0, ans=0.125 2024-08-19 18:21:49,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4478790.0, ans=0.0 2024-08-19 18:21:51,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4478790.0, ans=0.0 2024-08-19 18:21:52,924 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 26 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-19 18:21:54,148 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 3300, loss[loss=0.1191, beats_loss=0.00892, ecapa_loss=0.0001278, whisper_loss=0.1089, over 17188.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01042, ecapa_loss=0.0001412, whisper_loss=0.09161, over 3847304.21 frames. ], batch size: 65, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:22:07,905 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=15.0 2024-08-19 18:22:22,370 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 17 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 18:22:36,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4479090.0, ans=0.025 2024-08-19 18:22:42,157 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 24 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-19 18:22:46,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=4479190.0, ans=22.5 2024-08-19 18:23:02,763 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 18:23:07,890 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.244e+01 2.479e+01 2.933e+01 3.930e+01, threshold=4.957e+01, percent-clipped=0.0 2024-08-19 18:23:15,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4479290.0, ans=0.1 2024-08-19 18:23:17,904 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.91 vs. limit=10.0 2024-08-19 18:23:23,204 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 3350, loss[loss=0.08431, beats_loss=0.01367, ecapa_loss=0.0001256, whisper_loss=0.06938, over 18688.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01037, ecapa_loss=0.0001421, whisper_loss=0.09162, over 3828126.94 frames. ], batch size: 78, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:23:25,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4479390.0, ans=0.125 2024-08-19 18:24:30,124 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 23 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 18:24:37,212 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 22 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-19 18:24:45,950 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 23 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-19 18:24:52,562 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 3400, loss[loss=0.1032, beats_loss=0.009951, ecapa_loss=0.0001082, whisper_loss=0.09217, over 19334.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01038, ecapa_loss=0.0001418, whisper_loss=0.0908, over 3842675.97 frames. ], batch size: 74, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:25:01,733 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 25 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-19 18:25:10,407 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-448000.pt 2024-08-19 18:25:27,337 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 29 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-19 18:25:28,947 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-19 18:25:30,461 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 17 from LS+wenet, 25 from Vox, 48 fro AS 2024-08-19 18:25:38,430 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.41 vs. limit=15.0 2024-08-19 18:25:44,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4480090.0, ans=0.0 2024-08-19 18:25:47,449 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 30 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 18:26:06,064 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-19 18:26:06,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4480290.0, ans=0.125 2024-08-19 18:26:09,350 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.304e+01 2.534e+01 2.812e+01 7.025e+01, threshold=5.069e+01, percent-clipped=1.0 2024-08-19 18:26:10,002 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 23 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-19 18:26:19,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4480290.0, ans=0.125 2024-08-19 18:26:24,785 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 3450, loss[loss=0.1144, beats_loss=0.008896, ecapa_loss=0.0001701, whisper_loss=0.1038, over 21145.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01033, ecapa_loss=0.0001419, whisper_loss=0.09134, over 3852829.51 frames. ], batch size: 88, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:26:26,766 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 23 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-19 18:26:46,026 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.38 vs. limit=15.0 2024-08-19 18:26:49,162 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 18 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-19 18:27:23,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4480690.0, ans=0.0 2024-08-19 18:27:50,097 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 3500, loss[loss=0.1131, beats_loss=0.01177, ecapa_loss=0.0001486, whisper_loss=0.09985, over 15729.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01037, ecapa_loss=0.0001416, whisper_loss=0.09105, over 3815543.66 frames. ], batch size: 63, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:28:28,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4481090.0, ans=0.1 2024-08-19 18:28:29,573 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 18 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-19 18:28:31,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4481090.0, ans=0.0 2024-08-19 18:28:41,787 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 18 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-19 18:28:59,804 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 30 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-19 18:29:01,328 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.232e+01 2.458e+01 2.911e+01 6.376e+01, threshold=4.915e+01, percent-clipped=2.0 2024-08-19 18:29:05,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4481290.0, ans=0.125 2024-08-19 18:29:14,273 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 3550, loss[loss=0.1089, beats_loss=0.01017, ecapa_loss=0.0001423, whisper_loss=0.09729, over 22094.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01041, ecapa_loss=0.0001414, whisper_loss=0.09094, over 3814257.29 frames. ], batch size: 88, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:29:16,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4481390.0, ans=0.125 2024-08-19 18:29:24,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4481390.0, ans=0.025 2024-08-19 18:29:45,714 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-19 18:29:50,179 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-19 18:29:53,690 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 19 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-19 18:29:53,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=4481590.0, ans=0.1 2024-08-19 18:29:58,876 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.58 vs. limit=22.5 2024-08-19 18:30:09,415 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 20 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-19 18:30:13,101 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2024-08-19 18:30:22,509 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.44 vs. limit=22.5 2024-08-19 18:30:26,887 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 30 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-19 18:30:34,755 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 3600, loss[loss=0.1003, beats_loss=0.01143, ecapa_loss=0.0001141, whisper_loss=0.0877, over 23092.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01041, ecapa_loss=0.0001399, whisper_loss=0.09099, over 3807578.78 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:30:41,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4481890.0, ans=0.0 2024-08-19 18:30:47,042 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.17 vs. limit=15.0 2024-08-19 18:30:49,554 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 36 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 18:31:09,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4482090.0, ans=0.125 2024-08-19 18:31:23,145 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 23 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-19 18:31:28,414 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 18:31:42,014 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.192e+01 2.433e+01 2.584e+01 3.997e+01, threshold=4.865e+01, percent-clipped=0.0 2024-08-19 18:31:49,200 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 16 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-19 18:31:53,704 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 19 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-19 18:31:54,780 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 3650, loss[loss=0.1017, beats_loss=0.01137, ecapa_loss=0.0001076, whisper_loss=0.08921, over 15311.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01037, ecapa_loss=0.0001395, whisper_loss=0.0909, over 3815195.24 frames. ], batch size: 56, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:32:09,772 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.23 vs. limit=15.0 2024-08-19 18:32:19,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4482490.0, ans=0.1 2024-08-19 18:32:22,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4482490.0, ans=0.05 2024-08-19 18:32:39,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4482590.0, ans=0.0 2024-08-19 18:33:14,972 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 3700, loss[loss=0.09341, beats_loss=0.009654, ecapa_loss=0.0001344, whisper_loss=0.08242, over 17442.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01037, ecapa_loss=0.0001397, whisper_loss=0.09092, over 3781362.40 frames. ], batch size: 70, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:33:36,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4482990.0, ans=0.1 2024-08-19 18:33:55,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4483090.0, ans=0.0 2024-08-19 18:33:58,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4483090.0, ans=0.0 2024-08-19 18:34:08,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=4483090.0, ans=0.025 2024-08-19 18:34:09,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=4483190.0, ans=0.05 2024-08-19 18:34:09,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4483190.0, ans=0.0 2024-08-19 18:34:14,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4483190.0, ans=0.0 2024-08-19 18:34:18,815 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.08 vs. limit=10.0 2024-08-19 18:34:25,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4483290.0, ans=0.2 2024-08-19 18:34:30,024 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.652e+01 2.276e+01 2.511e+01 2.757e+01 7.975e+01, threshold=5.022e+01, percent-clipped=3.0 2024-08-19 18:34:31,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4483290.0, ans=0.125 2024-08-19 18:34:35,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4483290.0, ans=0.1 2024-08-19 18:34:42,963 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 3750, loss[loss=0.1126, beats_loss=0.009468, ecapa_loss=0.0001773, whisper_loss=0.1014, over 21236.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01041, ecapa_loss=0.0001391, whisper_loss=0.09047, over 3758895.61 frames. ], batch size: 88, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:34:46,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4483390.0, ans=0.125 2024-08-19 18:34:50,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4483390.0, ans=0.125 2024-08-19 18:34:56,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4483390.0, ans=0.125 2024-08-19 18:35:02,765 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 36 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 18:35:04,301 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 23 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 18:35:04,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4483490.0, ans=0.125 2024-08-19 18:35:07,336 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 13 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 18:35:10,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4483490.0, ans=0.125 2024-08-19 18:35:17,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4483590.0, ans=0.0 2024-08-19 18:35:19,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4483590.0, ans=0.125 2024-08-19 18:35:20,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4483590.0, ans=0.1 2024-08-19 18:35:33,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4483690.0, ans=0.1 2024-08-19 18:36:03,436 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 3800, loss[loss=0.1049, beats_loss=0.01125, ecapa_loss=0.00015, whisper_loss=0.09219, over 20287.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01034, ecapa_loss=0.0001397, whisper_loss=0.09112, over 3756074.22 frames. ], batch size: 81, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:36:05,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4483890.0, ans=0.0 2024-08-19 18:36:07,076 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 22 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-19 18:36:07,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4483890.0, ans=0.1 2024-08-19 18:36:14,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4483890.0, ans=0.1 2024-08-19 18:36:20,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4483990.0, ans=0.2 2024-08-19 18:36:27,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4483990.0, ans=0.1 2024-08-19 18:36:38,226 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 28 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-19 18:36:51,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4484190.0, ans=0.125 2024-08-19 18:36:57,494 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 23 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-19 18:37:07,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4484290.0, ans=0.125 2024-08-19 18:37:07,242 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=15.02 vs. limit=15.0 2024-08-19 18:37:09,429 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.312e+01 2.559e+01 2.923e+01 4.060e+01, threshold=5.118e+01, percent-clipped=0.0 2024-08-19 18:37:22,522 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 3850, loss[loss=0.09961, beats_loss=0.01058, ecapa_loss=0.0001401, whisper_loss=0.08763, over 16435.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01036, ecapa_loss=0.0001404, whisper_loss=0.09138, over 3782551.86 frames. ], batch size: 64, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:37:33,624 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 25 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 18:37:47,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4484490.0, ans=0.1 2024-08-19 18:37:48,431 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 17 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 18:38:19,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4484690.0, ans=0.125 2024-08-19 18:38:21,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4484690.0, ans=0.0 2024-08-19 18:38:31,702 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 25 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 18:38:40,575 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 3900, loss[loss=0.1155, beats_loss=0.01038, ecapa_loss=0.0001326, whisper_loss=0.1038, over 19538.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01041, ecapa_loss=0.0001399, whisper_loss=0.09107, over 3779592.33 frames. ], batch size: 79, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:39:04,327 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 18:39:14,778 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 21 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-19 18:39:28,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4485190.0, ans=0.5 2024-08-19 18:39:47,820 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.325e+01 2.529e+01 2.804e+01 3.948e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-19 18:40:00,819 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 3950, loss[loss=0.08889, beats_loss=0.01212, ecapa_loss=0.0001086, whisper_loss=0.07569, over 23540.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01041, ecapa_loss=0.0001411, whisper_loss=0.09081, over 3804999.27 frames. ], batch size: 93, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:40:13,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4485390.0, ans=0.125 2024-08-19 18:40:19,935 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 24 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-19 18:40:26,113 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 31 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 18:40:43,285 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 15 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 18:40:46,345 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 31 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 18:41:02,382 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.19 vs. limit=15.0 2024-08-19 18:41:15,136 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 18:41:22,594 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 4000, loss[loss=0.1156, beats_loss=0.01096, ecapa_loss=0.0001031, whisper_loss=0.1036, over 19175.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01036, ecapa_loss=0.0001408, whisper_loss=0.09174, over 3830784.05 frames. ], batch size: 72, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:41:24,722 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 19 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-19 18:42:01,448 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 35 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 18:42:29,811 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.300e+01 2.585e+01 3.012e+01 4.802e+01, threshold=5.170e+01, percent-clipped=0.0 2024-08-19 18:42:30,052 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 11 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 18:42:33,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4486290.0, ans=0.125 2024-08-19 18:42:42,331 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 4050, loss[loss=0.08811, beats_loss=0.00943, ecapa_loss=0.000166, whisper_loss=0.07702, over 17358.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01043, ecapa_loss=0.0001406, whisper_loss=0.09091, over 3858466.41 frames. ], batch size: 72, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:43:02,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4486490.0, ans=0.125 2024-08-19 18:43:14,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4486590.0, ans=0.125 2024-08-19 18:43:16,123 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.55 vs. limit=15.0 2024-08-19 18:43:18,634 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 36 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 18:43:23,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4486590.0, ans=0.5 2024-08-19 18:43:29,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4486690.0, ans=0.125 2024-08-19 18:43:38,305 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 28 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 18:43:49,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4486790.0, ans=0.05 2024-08-19 18:43:55,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4486790.0, ans=0.2 2024-08-19 18:44:01,532 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 4100, loss[loss=0.1059, beats_loss=0.008254, ecapa_loss=0.0001533, whisper_loss=0.09611, over 20076.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01045, ecapa_loss=0.0001411, whisper_loss=0.09076, over 3879184.16 frames. ], batch size: 79, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:44:01,808 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 27 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-19 18:44:16,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4486990.0, ans=0.0 2024-08-19 18:44:26,703 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 18:44:28,806 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2024-08-19 18:44:36,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4487090.0, ans=0.125 2024-08-19 18:44:52,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4487190.0, ans=0.0 2024-08-19 18:45:00,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4487190.0, ans=0.125 2024-08-19 18:45:03,688 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 18 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-19 18:45:07,348 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.401e+01 2.726e+01 3.123e+01 1.504e+02, threshold=5.451e+01, percent-clipped=2.0 2024-08-19 18:45:20,294 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 4150, loss[loss=0.09116, beats_loss=0.009205, ecapa_loss=0.000174, whisper_loss=0.08022, over 16710.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01052, ecapa_loss=0.0001409, whisper_loss=0.09055, over 3872941.73 frames. ], batch size: 65, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:45:48,914 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 21 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-19 18:45:49,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4487490.0, ans=0.125 2024-08-19 18:45:56,650 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 19 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-19 18:46:01,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=4487590.0, ans=0.95 2024-08-19 18:46:01,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4487590.0, ans=0.1 2024-08-19 18:46:05,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4487590.0, ans=0.0 2024-08-19 18:46:07,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4487690.0, ans=0.0 2024-08-19 18:46:12,499 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-08-19 18:46:18,366 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 14 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-19 18:46:25,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4487790.0, ans=0.0 2024-08-19 18:46:26,546 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 27 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-19 18:46:34,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4487790.0, ans=0.2 2024-08-19 18:46:40,451 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 4200, loss[loss=0.09536, beats_loss=0.01037, ecapa_loss=0.0001277, whisper_loss=0.08371, over 17055.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0106, ecapa_loss=0.0001405, whisper_loss=0.09005, over 3847864.64 frames. ], batch size: 69, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:46:46,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4487890.0, ans=0.125 2024-08-19 18:46:54,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4487890.0, ans=0.1 2024-08-19 18:46:59,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4487990.0, ans=0.0 2024-08-19 18:47:02,267 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.61 vs. limit=6.0 2024-08-19 18:47:37,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4488190.0, ans=0.125 2024-08-19 18:47:49,584 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.234e+01 2.488e+01 2.803e+01 1.323e+02, threshold=4.977e+01, percent-clipped=2.0 2024-08-19 18:47:55,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4488290.0, ans=0.0 2024-08-19 18:48:02,443 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 4250, loss[loss=0.1217, beats_loss=0.008125, ecapa_loss=0.0001428, whisper_loss=0.1121, over 24384.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01063, ecapa_loss=0.00014, whisper_loss=0.0902, over 3866705.70 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:48:10,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4488390.0, ans=0.09899494936611666 2024-08-19 18:48:38,308 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 18 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-19 18:48:38,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4488590.0, ans=0.0 2024-08-19 18:49:22,716 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 4300, loss[loss=0.1067, beats_loss=0.01226, ecapa_loss=0.0001234, whisper_loss=0.09323, over 22336.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01063, ecapa_loss=0.0001403, whisper_loss=0.08996, over 3858667.07 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:49:36,724 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.07 vs. limit=15.0 2024-08-19 18:50:02,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4489090.0, ans=0.0 2024-08-19 18:50:30,849 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.303e+01 2.487e+01 2.877e+01 4.114e+01, threshold=4.973e+01, percent-clipped=0.0 2024-08-19 18:50:37,871 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 13 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-19 18:50:42,546 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 22 from LS+wenet, 14 from Vox, 16 fro AS 2024-08-19 18:50:43,617 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 4350, loss[loss=0.1276, beats_loss=0.007467, ecapa_loss=0.0001491, whisper_loss=0.1187, over 14115.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01055, ecapa_loss=0.0001394, whisper_loss=0.09, over 3838385.34 frames. ], batch size: 52, lr: 1.98e-03, grad_scale: 1.152921504606847e+18 2024-08-19 18:50:53,998 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 20 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-19 18:50:56,116 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.28 vs. limit=10.0 2024-08-19 18:51:07,625 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 12 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-19 18:51:21,690 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 34 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 18:51:23,225 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 29 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-19 18:51:31,252 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=15.0 2024-08-19 18:51:34,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4489690.0, ans=0.1 2024-08-19 18:51:45,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4489690.0, ans=0.0 2024-08-19 18:51:56,969 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.24 vs. limit=12.0 2024-08-19 18:52:03,946 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 4400, loss[loss=0.1113, beats_loss=0.01016, ecapa_loss=0.0001565, whisper_loss=0.09958, over 19259.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01051, ecapa_loss=0.0001397, whisper_loss=0.09013, over 3815983.81 frames. ], batch size: 75, lr: 1.98e-03, grad_scale: 1.152921504606847e+18 2024-08-19 18:52:04,316 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 14 from LS+wenet, 26 from Vox, 19 fro AS 2024-08-19 18:52:06,521 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.59 vs. limit=15.0 2024-08-19 18:52:32,092 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 20 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-19 18:53:11,422 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.271e+01 2.455e+01 2.760e+01 4.090e+01, threshold=4.910e+01, percent-clipped=0.0 2024-08-19 18:53:23,761 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 4450, loss[loss=0.1026, beats_loss=0.008564, ecapa_loss=0.0001257, whisper_loss=0.09282, over 17067.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01037, ecapa_loss=0.0001399, whisper_loss=0.09037, over 3767304.02 frames. ], batch size: 67, lr: 1.98e-03, grad_scale: 1.152921504606847e+18 2024-08-19 18:53:25,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4490390.0, ans=0.125 2024-08-19 18:54:17,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4490690.0, ans=0.1 2024-08-19 18:54:17,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4490690.0, ans=0.125 2024-08-19 18:54:31,765 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.603e+01 2024-08-19 18:54:44,726 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 4500, loss[loss=0.1106, beats_loss=0.01124, ecapa_loss=0.0001408, whisper_loss=0.09796, over 14964.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01042, ecapa_loss=0.0001398, whisper_loss=0.09016, over 3776967.80 frames. ], batch size: 58, lr: 1.98e-03, grad_scale: 1.152921504606847e+18 2024-08-19 18:54:48,482 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 25 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-19 18:54:51,906 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 15 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-19 18:54:52,204 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 18:54:58,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4490890.0, ans=0.0 2024-08-19 18:55:08,072 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.19 vs. limit=15.0 2024-08-19 18:55:17,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4491090.0, ans=0.1 2024-08-19 18:55:19,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4491090.0, ans=0.125 2024-08-19 18:55:40,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4491190.0, ans=0.0 2024-08-19 18:55:54,986 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.683e+01 2.263e+01 2.559e+01 2.809e+01 3.466e+01, threshold=5.118e+01, percent-clipped=0.0 2024-08-19 18:56:00,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4491290.0, ans=0.125 2024-08-19 18:56:07,842 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 4550, loss[loss=0.1146, beats_loss=0.006637, ecapa_loss=0.0001512, whisper_loss=0.1064, over 13674.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01037, ecapa_loss=0.0001398, whisper_loss=0.09008, over 3753724.13 frames. ], batch size: 51, lr: 1.98e-03, grad_scale: 1.152921504606847e+18 2024-08-19 18:56:16,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4491390.0, ans=0.2 2024-08-19 18:56:19,039 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 18:56:23,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4491490.0, ans=0.0 2024-08-19 18:56:25,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4491490.0, ans=0.125 2024-08-19 18:56:47,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4491590.0, ans=0.0 2024-08-19 18:56:57,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4491690.0, ans=0.0 2024-08-19 18:57:04,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4491690.0, ans=0.125 2024-08-19 18:57:11,827 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 26 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-19 18:57:12,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4491690.0, ans=0.2 2024-08-19 18:57:13,527 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 26 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-19 18:57:19,266 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 18:57:32,997 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 4600, loss[loss=0.1033, beats_loss=0.01082, ecapa_loss=0.0001768, whisper_loss=0.09068, over 21947.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01034, ecapa_loss=0.0001404, whisper_loss=0.08999, over 3785782.01 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:57:47,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4491890.0, ans=0.0 2024-08-19 18:57:47,953 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2024-08-19 18:57:53,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=4491990.0, ans=10.0 2024-08-19 18:58:04,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4491990.0, ans=0.0 2024-08-19 18:58:06,214 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.99 vs. limit=15.0 2024-08-19 18:58:09,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4492090.0, ans=0.125 2024-08-19 18:58:24,897 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.39 vs. limit=15.0 2024-08-19 18:58:27,407 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 37 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 18:58:46,006 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.297e+01 2.492e+01 2.828e+01 4.082e+01, threshold=4.984e+01, percent-clipped=0.0 2024-08-19 18:58:57,895 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 4650, loss[loss=0.1125, beats_loss=0.009487, ecapa_loss=0.0001247, whisper_loss=0.1018, over 22396.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01041, ecapa_loss=0.0001407, whisper_loss=0.08985, over 3801332.56 frames. ], batch size: 88, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 18:59:01,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4492390.0, ans=0.125 2024-08-19 18:59:27,241 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 16 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-19 18:59:39,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4492590.0, ans=0.125 2024-08-19 19:00:09,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4492790.0, ans=0.04949747468305833 2024-08-19 19:00:21,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4492890.0, ans=0.1 2024-08-19 19:00:22,847 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 4700, loss[loss=0.07964, beats_loss=0.01126, ecapa_loss=0.0001236, whisper_loss=0.06714, over 22478.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0104, ecapa_loss=0.0001415, whisper_loss=0.08966, over 3782732.30 frames. ], batch size: 93, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:00:35,269 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.13 vs. limit=15.0 2024-08-19 19:00:38,245 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 19:00:45,387 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 22 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 19:00:55,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4493090.0, ans=0.0 2024-08-19 19:00:56,820 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 24 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-19 19:01:28,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4493290.0, ans=0.015 2024-08-19 19:01:30,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4493290.0, ans=0.125 2024-08-19 19:01:34,504 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.345e+01 2.552e+01 2.786e+01 4.462e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-19 19:01:36,476 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 32 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-19 19:01:43,226 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.11 vs. limit=15.0 2024-08-19 19:01:45,839 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 4750, loss[loss=0.1092, beats_loss=0.006547, ecapa_loss=0.0001711, whisper_loss=0.101, over 14017.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01047, ecapa_loss=0.0001419, whisper_loss=0.08958, over 3780017.42 frames. ], batch size: 53, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:01:51,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4493390.0, ans=0.125 2024-08-19 19:01:57,697 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-19 19:02:02,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4493490.0, ans=0.125 2024-08-19 19:02:07,204 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 19:02:09,059 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 27 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 19:02:20,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4493590.0, ans=0.2 2024-08-19 19:02:22,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4493590.0, ans=0.0 2024-08-19 19:02:25,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4493590.0, ans=0.125 2024-08-19 19:02:34,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4493690.0, ans=0.125 2024-08-19 19:02:36,556 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 33 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 19:02:42,795 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.12 vs. limit=15.0 2024-08-19 19:03:09,641 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 4800, loss[loss=0.06177, beats_loss=0.01436, ecapa_loss=0.0001104, whisper_loss=0.04631, over 12300.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01042, ecapa_loss=0.0001418, whisper_loss=0.09031, over 3800947.38 frames. ], batch size: 51, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:03:10,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4493890.0, ans=0.125 2024-08-19 19:03:17,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4493890.0, ans=0.125 2024-08-19 19:03:27,770 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 21 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-19 19:03:41,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.70 vs. limit=10.0 2024-08-19 19:04:01,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4494190.0, ans=0.125 2024-08-19 19:04:16,903 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 21 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 19:04:21,479 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.336e+01 2.600e+01 2.820e+01 4.344e+01, threshold=5.200e+01, percent-clipped=0.0 2024-08-19 19:04:33,251 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 4850, loss[loss=0.09327, beats_loss=0.01094, ecapa_loss=0.0001564, whisper_loss=0.08076, over 22198.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01038, ecapa_loss=0.0001417, whisper_loss=0.0901, over 3798082.72 frames. ], batch size: 94, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:04:34,677 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 27 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-19 19:04:55,825 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 22 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 19:04:57,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4494490.0, ans=0.0 2024-08-19 19:05:20,733 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 21 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-19 19:05:56,417 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 4900, loss[loss=0.08886, beats_loss=0.01059, ecapa_loss=0.0001579, whisper_loss=0.07669, over 21632.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01043, ecapa_loss=0.0001418, whisper_loss=0.08968, over 3817246.04 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:06:32,239 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 30 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-19 19:06:33,900 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 26 from LS+wenet, 25 from Vox, 15 fro AS 2024-08-19 19:06:36,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4495090.0, ans=0.125 2024-08-19 19:06:38,251 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 19:06:41,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4495090.0, ans=0.1 2024-08-19 19:06:42,981 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 24 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 19:06:43,548 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.81 vs. limit=22.5 2024-08-19 19:07:04,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=4495290.0, ans=0.1 2024-08-19 19:07:10,327 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.336e+01 2.530e+01 2.860e+01 1.367e+02, threshold=5.061e+01, percent-clipped=1.0 2024-08-19 19:07:22,615 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 4950, loss[loss=0.1249, beats_loss=0.008726, ecapa_loss=0.0001528, whisper_loss=0.1147, over 22248.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01046, ecapa_loss=0.0001418, whisper_loss=0.0895, over 3824424.38 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:07:29,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4495390.0, ans=0.0 2024-08-19 19:07:31,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4495390.0, ans=0.2 2024-08-19 19:07:33,557 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2024-08-19 19:07:43,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4495490.0, ans=0.2 2024-08-19 19:07:48,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4495490.0, ans=0.0 2024-08-19 19:07:52,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4495490.0, ans=0.1 2024-08-19 19:08:05,559 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 29 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-19 19:08:39,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4495790.0, ans=0.125 2024-08-19 19:08:39,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4495790.0, ans=0.0 2024-08-19 19:08:49,523 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 5000, loss[loss=0.08742, beats_loss=0.01208, ecapa_loss=0.0001611, whisper_loss=0.07372, over 20229.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01044, ecapa_loss=0.0001419, whisper_loss=0.0892, over 3825446.69 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:08:55,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4495890.0, ans=0.125 2024-08-19 19:09:09,877 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 27 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-19 19:09:17,023 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 19:09:27,762 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.91 vs. limit=15.0 2024-08-19 19:10:04,541 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.356e+01 2.547e+01 2.785e+01 7.027e+01, threshold=5.094e+01, percent-clipped=1.0 2024-08-19 19:10:05,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4496290.0, ans=0.125 2024-08-19 19:10:06,460 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 26 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-19 19:10:08,191 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 18 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-19 19:10:11,604 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 13 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 19:10:16,956 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 5050, loss[loss=0.1009, beats_loss=0.008653, ecapa_loss=0.0001563, whisper_loss=0.09071, over 15007.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01039, ecapa_loss=0.0001421, whisper_loss=0.08957, over 3829663.98 frames. ], batch size: 59, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:10:18,722 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 21 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-19 19:10:49,176 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-08-19 19:11:00,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4496590.0, ans=0.0 2024-08-19 19:11:03,297 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 22 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-19 19:11:28,440 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 24 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 19:11:33,861 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 19:11:41,053 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=22.5 2024-08-19 19:11:42,214 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 5100, loss[loss=0.1302, beats_loss=0.009305, ecapa_loss=0.0001361, whisper_loss=0.1195, over 23559.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01036, ecapa_loss=0.0001416, whisper_loss=0.08993, over 3803737.58 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:11:42,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4496890.0, ans=0.125 2024-08-19 19:11:57,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4496990.0, ans=0.0 2024-08-19 19:12:00,665 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 23 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-19 19:12:00,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4496990.0, ans=0.1 2024-08-19 19:12:12,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4496990.0, ans=0.125 2024-08-19 19:12:17,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4497090.0, ans=0.125 2024-08-19 19:12:45,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4497190.0, ans=0.125 2024-08-19 19:12:50,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4497290.0, ans=0.1 2024-08-19 19:12:53,503 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.293e+01 2.514e+01 2.831e+01 4.907e+01, threshold=5.028e+01, percent-clipped=0.0 2024-08-19 19:13:05,685 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 5150, loss[loss=0.1022, beats_loss=0.01135, ecapa_loss=0.0001474, whisper_loss=0.08936, over 15327.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01046, ecapa_loss=0.0001417, whisper_loss=0.0897, over 3786312.97 frames. ], batch size: 59, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:13:06,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4497390.0, ans=0.0 2024-08-19 19:13:21,918 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 33 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 19:13:39,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4497490.0, ans=0.1 2024-08-19 19:13:45,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4497590.0, ans=0.07 2024-08-19 19:13:53,245 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 16 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 19:13:53,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4497590.0, ans=0.125 2024-08-19 19:14:23,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4497790.0, ans=0.125 2024-08-19 19:14:24,078 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.92 vs. limit=15.0 2024-08-19 19:14:27,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4497790.0, ans=0.125 2024-08-19 19:14:33,135 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 5200, loss[loss=0.09807, beats_loss=0.009108, ecapa_loss=0.000134, whisper_loss=0.08763, over 13887.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0105, ecapa_loss=0.0001426, whisper_loss=0.09032, over 3805032.08 frames. ], batch size: 53, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:15:08,305 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.76 vs. limit=22.5 2024-08-19 19:15:15,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4498090.0, ans=0.125 2024-08-19 19:15:27,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4498190.0, ans=0.1 2024-08-19 19:15:28,281 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.28 vs. limit=15.0 2024-08-19 19:15:41,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4498290.0, ans=0.125 2024-08-19 19:15:46,204 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.272e+01 2.588e+01 2.852e+01 4.438e+01, threshold=5.176e+01, percent-clipped=0.0 2024-08-19 19:15:57,699 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 5250, loss[loss=0.09287, beats_loss=0.01048, ecapa_loss=0.000163, whisper_loss=0.08076, over 22007.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01043, ecapa_loss=0.0001423, whisper_loss=0.09069, over 3827925.90 frames. ], batch size: 93, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:16:00,401 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 24 from LS+wenet, 14 from Vox, 51 fro AS 2024-08-19 19:16:03,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4498390.0, ans=0.0 2024-08-19 19:16:11,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4498390.0, ans=0.125 2024-08-19 19:16:19,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4498490.0, ans=0.125 2024-08-19 19:16:24,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4498490.0, ans=0.0 2024-08-19 19:16:32,997 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 31 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 19:16:33,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4498590.0, ans=0.125 2024-08-19 19:16:44,681 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 32 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 19:16:46,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4498690.0, ans=0.125 2024-08-19 19:16:54,723 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 22 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-19 19:16:54,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4498690.0, ans=0.1 2024-08-19 19:16:56,171 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 19 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 19:17:07,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4498790.0, ans=0.0 2024-08-19 19:17:19,621 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 5300, loss[loss=0.08751, beats_loss=0.01269, ecapa_loss=0.0001581, whisper_loss=0.07324, over 15648.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01049, ecapa_loss=0.0001418, whisper_loss=0.09065, over 3807573.41 frames. ], batch size: 65, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:17:30,022 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 25 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 19:17:34,189 WARNING [optim.py:496] (0/4) Scaling gradients by 0.07655883580446243, model_norm_threshold=51.76279067993164 2024-08-19 19:17:34,345 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.32, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.480e+05, grad_sumsq=1.406e+07, orig_rms_sq=1.053e-02 2024-08-19 19:17:39,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4498990.0, ans=0.0 2024-08-19 19:17:46,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4498990.0, ans=0.1 2024-08-19 19:18:30,310 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.660e+01 2.288e+01 2.528e+01 2.946e+01 6.761e+02, threshold=5.056e+01, percent-clipped=1.0 2024-08-19 19:18:42,117 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 5350, loss[loss=0.09757, beats_loss=0.01136, ecapa_loss=0.000144, whisper_loss=0.08477, over 22280.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01045, ecapa_loss=0.0001414, whisper_loss=0.09049, over 3805361.61 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:18:57,364 WARNING [optim.py:496] (0/4) Scaling gradients by 0.05797187611460686, model_norm_threshold=50.55705261230469 2024-08-19 19:18:57,518 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.4.encoder.layers.0.norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.406e+05, grad_sumsq=1.406e+05, orig_rms_sq=1.000e+00 2024-08-19 19:19:03,540 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 19:19:05,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4499490.0, ans=0.125 2024-08-19 19:19:05,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4499490.0, ans=0.2 2024-08-19 19:19:14,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4499490.0, ans=0.1 2024-08-19 19:19:30,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4499590.0, ans=0.95 2024-08-19 19:19:56,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4499790.0, ans=0.125 2024-08-19 19:20:03,172 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2024-08-19 19:20:04,423 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 19:20:13,375 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 5400, loss[loss=0.1279, beats_loss=0.008444, ecapa_loss=0.0001375, whisper_loss=0.1181, over 22155.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01037, ecapa_loss=0.0001416, whisper_loss=0.09108, over 3841008.24 frames. ], batch size: 85, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:20:21,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4499890.0, ans=0.125 2024-08-19 19:20:27,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4499890.0, ans=0.1 2024-08-19 19:20:29,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4499990.0, ans=0.2 2024-08-19 19:20:31,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4499990.0, ans=0.1 2024-08-19 19:20:38,089 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 18 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-19 19:20:46,789 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.86 vs. limit=6.0 2024-08-19 19:20:49,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4500090.0, ans=0.07 2024-08-19 19:21:02,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4500090.0, ans=0.0 2024-08-19 19:21:27,595 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.273e+01 2.608e+01 3.002e+01 8.721e+02, threshold=5.217e+01, percent-clipped=3.0 2024-08-19 19:21:29,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4500290.0, ans=0.0 2024-08-19 19:21:31,218 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-19 19:21:31,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4500290.0, ans=0.05 2024-08-19 19:21:36,543 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 22 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 19:21:39,177 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 5450, loss[loss=0.1113, beats_loss=0.01067, ecapa_loss=0.0001003, whisper_loss=0.09964, over 24233.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01031, ecapa_loss=0.0001401, whisper_loss=0.09097, over 3812699.15 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:21:40,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4500390.0, ans=0.0 2024-08-19 19:21:42,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4500390.0, ans=0.125 2024-08-19 19:22:10,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4500490.0, ans=0.125 2024-08-19 19:22:13,604 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 20 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-19 19:22:13,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4500590.0, ans=0.0 2024-08-19 19:22:15,813 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.45 vs. limit=22.5 2024-08-19 19:22:43,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4500690.0, ans=0.2 2024-08-19 19:23:07,457 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 27 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-19 19:23:08,979 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 5500, loss[loss=0.107, beats_loss=0.008637, ecapa_loss=0.0001708, whisper_loss=0.0967, over 21623.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01031, ecapa_loss=0.0001405, whisper_loss=0.09077, over 3817138.54 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:23:16,662 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 19:23:21,464 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 25 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 19:23:21,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4500890.0, ans=0.0 2024-08-19 19:23:37,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4500990.0, ans=0.2 2024-08-19 19:23:41,427 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.07 vs. limit=22.5 2024-08-19 19:23:49,019 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 21 from LS+wenet, 33 from Vox, 41 fro AS 2024-08-19 19:23:50,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4501090.0, ans=0.0 2024-08-19 19:24:01,640 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.62 vs. limit=22.5 2024-08-19 19:24:25,792 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.236e+01 2.438e+01 2.713e+01 9.093e+01, threshold=4.875e+01, percent-clipped=1.0 2024-08-19 19:24:30,463 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.55 vs. limit=15.0 2024-08-19 19:24:39,709 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 5550, loss[loss=0.0773, beats_loss=0.01036, ecapa_loss=0.0001311, whisper_loss=0.06563, over 13371.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01028, ecapa_loss=0.0001418, whisper_loss=0.09098, over 3829090.44 frames. ], batch size: 52, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:24:45,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4501390.0, ans=0.025 2024-08-19 19:24:56,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4501390.0, ans=0.2 2024-08-19 19:25:12,607 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 19:25:26,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4501590.0, ans=0.0 2024-08-19 19:25:42,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4501690.0, ans=0.0 2024-08-19 19:25:59,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4501790.0, ans=0.1 2024-08-19 19:26:13,540 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 19:26:15,402 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 5600, loss[loss=0.1009, beats_loss=0.01016, ecapa_loss=0.0001246, whisper_loss=0.08953, over 23372.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01025, ecapa_loss=0.0001423, whisper_loss=0.09036, over 3855209.01 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:26:32,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4501990.0, ans=0.125 2024-08-19 19:26:44,504 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 16 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 19:26:54,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4502090.0, ans=0.0 2024-08-19 19:27:02,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4502090.0, ans=0.0 2024-08-19 19:27:04,524 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 13 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-19 19:27:05,008 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.98 vs. limit=10.0 2024-08-19 19:27:27,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4502190.0, ans=0.0 2024-08-19 19:27:37,764 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.669e+01 2.290e+01 2.503e+01 2.698e+01 5.557e+01, threshold=5.007e+01, percent-clipped=1.0 2024-08-19 19:27:38,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4502290.0, ans=0.0 2024-08-19 19:27:51,977 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 5650, loss[loss=0.1067, beats_loss=0.01077, ecapa_loss=0.0001331, whisper_loss=0.09463, over 18979.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01026, ecapa_loss=0.0001415, whisper_loss=0.09047, over 3857267.83 frames. ], batch size: 73, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:28:11,859 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 14 from LS+wenet, 26 from Vox, 17 fro AS 2024-08-19 19:28:15,514 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 25 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-19 19:28:41,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4502590.0, ans=0.125 2024-08-19 19:28:57,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4502690.0, ans=0.0 2024-08-19 19:29:03,431 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.27 vs. limit=22.5 2024-08-19 19:29:04,602 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-19 19:29:19,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4502790.0, ans=0.2 2024-08-19 19:29:27,594 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 5700, loss[loss=0.1022, beats_loss=0.009139, ecapa_loss=0.0001599, whisper_loss=0.09146, over 19152.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01027, ecapa_loss=0.0001414, whisper_loss=0.09036, over 3862705.12 frames. ], batch size: 78, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:29:29,872 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.56 vs. limit=12.0 2024-08-19 19:29:31,723 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 19:29:47,194 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 19:29:55,028 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 24 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-19 19:30:17,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4503090.0, ans=0.0 2024-08-19 19:30:23,053 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 19 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-19 19:30:41,833 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 25 from LS+wenet, 11 from Vox, 42 fro AS 2024-08-19 19:30:47,392 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 19:30:51,117 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.276e+01 2.546e+01 2.979e+01 5.244e+01, threshold=5.092e+01, percent-clipped=1.0 2024-08-19 19:31:04,801 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 5750, loss[loss=0.08715, beats_loss=0.01231, ecapa_loss=0.0001484, whisper_loss=0.07336, over 17995.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01035, ecapa_loss=0.0001429, whisper_loss=0.0905, over 3905198.83 frames. ], batch size: 73, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:31:44,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4503590.0, ans=0.2 2024-08-19 19:32:14,509 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 26 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-19 19:32:20,346 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.99 vs. limit=15.0 2024-08-19 19:32:30,815 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 20 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 19:32:31,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4503790.0, ans=0.0 2024-08-19 19:32:35,819 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 5800, loss[loss=0.1075, beats_loss=0.00774, ecapa_loss=0.0001186, whisper_loss=0.09858, over 14390.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01028, ecapa_loss=0.0001432, whisper_loss=0.09091, over 3911968.75 frames. ], batch size: 51, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:32:37,971 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 19 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-19 19:32:53,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4503990.0, ans=0.2 2024-08-19 19:32:55,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4503990.0, ans=0.2 2024-08-19 19:33:07,333 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2024-08-19 19:33:09,213 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 16 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 19:33:23,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4504090.0, ans=0.0 2024-08-19 19:33:36,292 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 19:33:45,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4504190.0, ans=0.125 2024-08-19 19:33:53,467 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.99 vs. limit=15.0 2024-08-19 19:33:58,183 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.339e+01 2.561e+01 2.956e+01 4.463e+01, threshold=5.121e+01, percent-clipped=0.0 2024-08-19 19:34:07,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=4504290.0, ans=0.05 2024-08-19 19:34:11,251 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 5850, loss[loss=0.09919, beats_loss=0.01211, ecapa_loss=0.0001127, whisper_loss=0.08596, over 22851.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01032, ecapa_loss=0.0001431, whisper_loss=0.09088, over 3944450.01 frames. ], batch size: 93, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:34:13,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4504390.0, ans=0.125 2024-08-19 19:34:18,156 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 35 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-19 19:34:41,661 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.18 vs. limit=15.0 2024-08-19 19:35:04,120 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 16 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-19 19:35:12,347 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2024-08-19 19:35:20,958 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.357e+01 2024-08-19 19:35:44,443 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 5900, loss[loss=0.1184, beats_loss=0.009964, ecapa_loss=0.0001671, whisper_loss=0.1067, over 19095.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01037, ecapa_loss=0.0001428, whisper_loss=0.09084, over 3903793.36 frames. ], batch size: 76, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:35:50,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4504890.0, ans=0.0 2024-08-19 19:35:53,615 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 22 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-19 19:36:16,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4504990.0, ans=0.035 2024-08-19 19:36:21,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4504990.0, ans=0.125 2024-08-19 19:36:24,059 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.90 vs. limit=15.0 2024-08-19 19:36:55,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4505190.0, ans=0.125 2024-08-19 19:37:06,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4505290.0, ans=0.125 2024-08-19 19:37:09,564 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.259e+01 2.428e+01 2.766e+01 1.765e+02, threshold=4.857e+01, percent-clipped=1.0 2024-08-19 19:37:17,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4505290.0, ans=0.125 2024-08-19 19:37:23,702 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 5950, loss[loss=0.08908, beats_loss=0.01076, ecapa_loss=0.000154, whisper_loss=0.07678, over 21528.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01041, ecapa_loss=0.0001424, whisper_loss=0.08992, over 3880346.04 frames. ], batch size: 87, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:37:24,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4505390.0, ans=0.0 2024-08-19 19:37:24,452 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.63 vs. limit=15.0 2024-08-19 19:37:52,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4505490.0, ans=0.95 2024-08-19 19:38:05,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4505590.0, ans=0.1 2024-08-19 19:38:15,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4505590.0, ans=0.0 2024-08-19 19:38:29,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4505690.0, ans=0.125 2024-08-19 19:38:34,666 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.03 vs. limit=12.0 2024-08-19 19:38:43,577 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 20 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-19 19:38:58,740 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 6000, loss[loss=0.09817, beats_loss=0.01166, ecapa_loss=0.0001458, whisper_loss=0.08505, over 20666.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01044, ecapa_loss=0.0001427, whisper_loss=0.08941, over 3826035.65 frames. ], batch size: 85, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:38:58,741 INFO [train_multi_KD3.py:1140] (0/4) Computing validation loss 2024-08-19 19:39:35,527 INFO [train_multi_KD3.py:1150] (0/4) Epoch 31, validation on ASR_libri: loss=0.254, beats_loss=0, ecapa_loss=0.0005172, whisper_loss=0.2488, over 931116.00 frames. 2024-08-19 19:39:57,509 INFO [train_multi_KD3.py:1150] (0/4) Epoch 31, validation on SV_voxceleb1: loss=0.003973, beats_loss=0, ecapa_loss=0.0003973, whisper_loss=0, over 944235.00 frames. 2024-08-19 19:41:38,189 INFO [train_multi_KD3.py:1150] (0/4) Epoch 31, validation on AT_audioset: loss=0.02294, beats_loss=0.02294, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 19:41:38,194 INFO [train_multi_KD3.py:1156] (0/4) Maximum memory allocated so far is 32775MB 2024-08-19 19:41:46,393 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 35 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-19 19:42:25,192 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 21 from LS+wenet, 22 from Vox, 50 fro AS 2024-08-19 19:42:55,429 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.381e+01 2.694e+01 2.980e+01 4.120e+01, threshold=5.388e+01, percent-clipped=0.0 2024-08-19 19:43:07,814 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 6050, loss[loss=0.108, beats_loss=0.01103, ecapa_loss=0.0001302, whisper_loss=0.09572, over 22296.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01051, ecapa_loss=0.0001419, whisper_loss=0.08904, over 3813696.41 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:43:08,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4506390.0, ans=0.125 2024-08-19 19:43:20,427 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 17 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-19 19:43:32,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4506490.0, ans=0.2 2024-08-19 19:43:58,632 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.20 vs. limit=15.0 2024-08-19 19:44:07,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4506690.0, ans=0.125 2024-08-19 19:44:16,137 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 35 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-19 19:44:24,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4506790.0, ans=0.125 2024-08-19 19:44:33,907 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 25 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-19 19:44:34,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4506790.0, ans=0.125 2024-08-19 19:44:34,559 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.37 vs. limit=10.0 2024-08-19 19:44:37,712 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 6100, loss[loss=0.07764, beats_loss=0.01413, ecapa_loss=0.0001364, whisper_loss=0.06215, over 18132.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01056, ecapa_loss=0.000141, whisper_loss=0.08861, over 3785412.05 frames. ], batch size: 76, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:44:59,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4506990.0, ans=0.2 2024-08-19 19:45:07,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4506990.0, ans=0.125 2024-08-19 19:45:09,038 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 21 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-19 19:45:29,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4507190.0, ans=0.125 2024-08-19 19:45:33,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4507190.0, ans=0.09899494936611666 2024-08-19 19:45:34,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4507190.0, ans=0.5 2024-08-19 19:45:38,234 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 18 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 19:45:45,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4507290.0, ans=0.125 2024-08-19 19:45:52,628 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.726e+01 2.249e+01 2.614e+01 2.889e+01 5.523e+01, threshold=5.228e+01, percent-clipped=1.0 2024-08-19 19:45:53,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4507290.0, ans=0.125 2024-08-19 19:45:53,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4507290.0, ans=0.125 2024-08-19 19:46:07,433 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 6150, loss[loss=0.1043, beats_loss=0.007193, ecapa_loss=0.0001641, whisper_loss=0.09549, over 13306.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01059, ecapa_loss=0.0001401, whisper_loss=0.08845, over 3774276.96 frames. ], batch size: 51, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:46:10,090 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 28 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 19:46:11,752 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 17 from LS+wenet, 23 from Vox, 11 fro AS 2024-08-19 19:46:13,312 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-19 19:46:13,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4507390.0, ans=0.0 2024-08-19 19:46:33,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4507490.0, ans=0.0 2024-08-19 19:47:38,074 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 6200, loss[loss=0.1008, beats_loss=0.01045, ecapa_loss=0.0001458, whisper_loss=0.08893, over 20626.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01053, ecapa_loss=0.0001394, whisper_loss=0.08966, over 3800656.68 frames. ], batch size: 84, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:47:40,733 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.10 vs. limit=6.0 2024-08-19 19:47:49,305 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 22 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-19 19:48:05,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4507990.0, ans=0.0 2024-08-19 19:48:09,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4507990.0, ans=0.0 2024-08-19 19:48:10,911 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 12 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-19 19:48:11,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4507990.0, ans=0.125 2024-08-19 19:48:19,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4508090.0, ans=0.2 2024-08-19 19:48:59,692 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.361e+01 2.656e+01 2.980e+01 4.502e+01, threshold=5.312e+01, percent-clipped=0.0 2024-08-19 19:49:04,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4508290.0, ans=0.1 2024-08-19 19:49:14,866 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 6250, loss[loss=0.1118, beats_loss=0.01021, ecapa_loss=0.0001816, whisper_loss=0.09981, over 21131.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01051, ecapa_loss=0.0001415, whisper_loss=0.08963, over 3817719.88 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:49:29,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4508390.0, ans=0.0 2024-08-19 19:49:59,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4508590.0, ans=0.1 2024-08-19 19:50:22,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4508690.0, ans=0.07 2024-08-19 19:50:24,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4508690.0, ans=0.125 2024-08-19 19:50:55,205 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 6300, loss[loss=0.09993, beats_loss=0.01172, ecapa_loss=0.0001394, whisper_loss=0.08682, over 21736.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01041, ecapa_loss=0.0001422, whisper_loss=0.0901, over 3838604.71 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:51:06,503 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 25 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-19 19:51:18,338 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 17 from LS+wenet, 9 from Vox, 24 fro AS 2024-08-19 19:51:27,702 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 19 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-19 19:51:33,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4509090.0, ans=0.0 2024-08-19 19:51:41,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4509090.0, ans=0.125 2024-08-19 19:51:43,110 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 20 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 19:51:50,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4509090.0, ans=0.125 2024-08-19 19:52:03,381 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.58 vs. limit=12.0 2024-08-19 19:52:10,827 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 18 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-19 19:52:18,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4509290.0, ans=0.0 2024-08-19 19:52:20,373 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.295e+01 2.454e+01 2.761e+01 3.903e+01, threshold=4.908e+01, percent-clipped=0.0 2024-08-19 19:52:23,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4509290.0, ans=0.1 2024-08-19 19:52:34,340 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 6350, loss[loss=0.0828, beats_loss=0.01142, ecapa_loss=0.0001661, whisper_loss=0.06972, over 18668.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01042, ecapa_loss=0.0001413, whisper_loss=0.08912, over 3791385.20 frames. ], batch size: 79, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:53:05,137 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.16 vs. limit=22.5 2024-08-19 19:53:20,509 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.06 vs. limit=12.0 2024-08-19 19:53:22,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4509590.0, ans=0.125 2024-08-19 19:54:10,351 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.59 vs. limit=15.0 2024-08-19 19:54:13,966 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 6400, loss[loss=0.09973, beats_loss=0.01004, ecapa_loss=0.0001677, whisper_loss=0.08802, over 18221.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0104, ecapa_loss=0.0001417, whisper_loss=0.08923, over 3818345.40 frames. ], batch size: 72, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:54:29,559 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.73 vs. limit=15.0 2024-08-19 19:54:30,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4509890.0, ans=0.1 2024-08-19 19:55:38,222 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.373e+01 2.627e+01 3.161e+01 1.061e+02, threshold=5.254e+01, percent-clipped=1.0 2024-08-19 19:55:45,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4510290.0, ans=0.2 2024-08-19 19:55:50,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4510390.0, ans=0.0 2024-08-19 19:55:51,588 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 6450, loss[loss=0.1093, beats_loss=0.009813, ecapa_loss=0.0001582, whisper_loss=0.09794, over 23008.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01029, ecapa_loss=0.0001422, whisper_loss=0.09034, over 3802302.75 frames. ], batch size: 94, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:56:13,798 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 14 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 19:56:14,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=4510490.0, ans=15.0 2024-08-19 19:56:25,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4510490.0, ans=0.125 2024-08-19 19:56:48,022 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 28 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 19:56:56,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4510690.0, ans=0.125 2024-08-19 19:56:58,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4510690.0, ans=0.2 2024-08-19 19:57:26,790 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.998e+01 2024-08-19 19:57:28,134 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 6500, loss[loss=0.1003, beats_loss=0.01262, ecapa_loss=0.0001733, whisper_loss=0.08598, over 20013.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01039, ecapa_loss=0.0001422, whisper_loss=0.08952, over 3796600.71 frames. ], batch size: 88, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:58:23,327 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 26 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-19 19:58:43,529 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.329e+01 2.544e+01 2.954e+01 4.370e+01, threshold=5.088e+01, percent-clipped=0.0 2024-08-19 19:58:55,527 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 6550, loss[loss=0.08394, beats_loss=0.0123, ecapa_loss=0.0001628, whisper_loss=0.07001, over 21460.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01041, ecapa_loss=0.0001431, whisper_loss=0.08943, over 3825151.05 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 19:59:02,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4511390.0, ans=0.07 2024-08-19 19:59:07,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4511390.0, ans=0.0 2024-08-19 19:59:11,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4511490.0, ans=0.0 2024-08-19 19:59:13,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4511490.0, ans=0.0 2024-08-19 19:59:15,453 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 27 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-19 19:59:40,172 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.35 vs. limit=22.5 2024-08-19 19:59:46,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4511690.0, ans=0.1 2024-08-19 19:59:50,061 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 21 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-19 19:59:56,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4511690.0, ans=0.0 2024-08-19 19:59:58,305 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2024-08-19 20:00:09,145 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 25 from LS+wenet, 30 from Vox, 25 fro AS 2024-08-19 20:00:11,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4511790.0, ans=0.0 2024-08-19 20:00:16,631 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.09 vs. limit=15.0 2024-08-19 20:00:21,663 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 6600, loss[loss=0.08607, beats_loss=0.0138, ecapa_loss=0.0001089, whisper_loss=0.07118, over 22441.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01033, ecapa_loss=0.0001443, whisper_loss=0.09021, over 3835958.55 frames. ], batch size: 93, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:00:22,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4511890.0, ans=0.2 2024-08-19 20:01:05,379 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.59 vs. limit=15.0 2024-08-19 20:01:06,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4512090.0, ans=0.2 2024-08-19 20:01:34,168 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.409e+01 2.632e+01 2.888e+01 4.355e+02, threshold=5.264e+01, percent-clipped=1.0 2024-08-19 20:01:45,090 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 6650, loss[loss=0.1161, beats_loss=0.01026, ecapa_loss=0.0001358, whisper_loss=0.1045, over 22299.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01033, ecapa_loss=0.0001439, whisper_loss=0.09098, over 3861031.72 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:01:51,735 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 30 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-19 20:02:00,320 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 27 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-19 20:02:02,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4512490.0, ans=0.1 2024-08-19 20:02:14,046 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.47 vs. limit=22.5 2024-08-19 20:02:27,870 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 29 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-19 20:02:33,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4512690.0, ans=0.125 2024-08-19 20:02:35,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4512690.0, ans=0.1 2024-08-19 20:02:37,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4512690.0, ans=0.035 2024-08-19 20:03:02,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4512790.0, ans=0.0 2024-08-19 20:03:06,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4512890.0, ans=0.0 2024-08-19 20:03:06,966 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 6700, loss[loss=0.1029, beats_loss=0.01222, ecapa_loss=0.0001099, whisper_loss=0.08958, over 22062.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01041, ecapa_loss=0.0001428, whisper_loss=0.09137, over 3858601.18 frames. ], batch size: 86, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:03:09,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4512890.0, ans=0.0 2024-08-19 20:03:40,098 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 25 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-19 20:04:03,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4513190.0, ans=0.2 2024-08-19 20:04:20,695 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.359e+01 2.752e+01 3.006e+01 5.924e+01, threshold=5.504e+01, percent-clipped=1.0 2024-08-19 20:04:24,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4513290.0, ans=0.0 2024-08-19 20:04:25,884 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 28 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-19 20:04:30,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4513390.0, ans=0.2 2024-08-19 20:04:31,941 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 6750, loss[loss=0.09322, beats_loss=0.01094, ecapa_loss=0.0001165, whisper_loss=0.08111, over 14582.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01039, ecapa_loss=0.0001426, whisper_loss=0.09182, over 3885411.04 frames. ], batch size: 55, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:04:32,185 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-19 20:04:49,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4513490.0, ans=0.125 2024-08-19 20:05:11,007 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 39 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-19 20:05:14,274 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.589e+01 2024-08-19 20:05:16,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4513590.0, ans=0.125 2024-08-19 20:05:19,918 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.01 vs. limit=15.0 2024-08-19 20:05:24,735 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.682e-01 2024-08-19 20:05:27,871 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 26 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-19 20:05:29,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4513690.0, ans=0.0 2024-08-19 20:05:29,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4513690.0, ans=0.125 2024-08-19 20:05:56,269 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 6800, loss[loss=0.08828, beats_loss=0.01079, ecapa_loss=0.000134, whisper_loss=0.07615, over 16513.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01038, ecapa_loss=0.0001413, whisper_loss=0.09139, over 3894826.46 frames. ], batch size: 67, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:05:58,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4513890.0, ans=0.0 2024-08-19 20:06:03,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4513890.0, ans=0.125 2024-08-19 20:06:09,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4513890.0, ans=0.0 2024-08-19 20:06:14,829 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 18 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-19 20:06:25,787 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=15.0 2024-08-19 20:06:43,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4514190.0, ans=0.1 2024-08-19 20:06:57,328 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2024-08-19 20:07:05,360 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.56 vs. limit=15.0 2024-08-19 20:07:06,917 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.34 vs. limit=15.0 2024-08-19 20:07:08,248 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.317e+01 2.530e+01 2.822e+01 4.267e+01, threshold=5.060e+01, percent-clipped=0.0 2024-08-19 20:07:17,725 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 6850, loss[loss=0.1111, beats_loss=0.009455, ecapa_loss=0.0001755, whisper_loss=0.0999, over 21793.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01035, ecapa_loss=0.0001414, whisper_loss=0.09166, over 3882335.06 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:07:20,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=4514390.0, ans=10.0 2024-08-19 20:07:23,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4514390.0, ans=0.0 2024-08-19 20:07:48,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4514490.0, ans=0.0 2024-08-19 20:07:52,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4514590.0, ans=0.0 2024-08-19 20:08:02,865 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 27 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-19 20:08:33,671 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.96 vs. limit=22.5 2024-08-19 20:08:34,369 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 32 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-19 20:08:37,483 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 24 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-19 20:08:40,594 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 6900, loss[loss=0.06987, beats_loss=0.01278, ecapa_loss=0.0001206, whisper_loss=0.05589, over 21870.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01038, ecapa_loss=0.0001411, whisper_loss=0.09103, over 3866239.96 frames. ], batch size: 88, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:08:42,154 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-19 20:08:48,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4514890.0, ans=0.125 2024-08-19 20:08:50,669 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.84 vs. limit=15.0 2024-08-19 20:08:56,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=4514990.0, ans=10.0 2024-08-19 20:08:58,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4514990.0, ans=0.2 2024-08-19 20:09:01,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4514990.0, ans=0.125 2024-08-19 20:09:29,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4515190.0, ans=0.0 2024-08-19 20:09:38,246 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.72 vs. limit=22.5 2024-08-19 20:09:49,786 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.290e+01 2.485e+01 2.784e+01 7.248e+01, threshold=4.970e+01, percent-clipped=1.0 2024-08-19 20:09:54,344 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.39 vs. limit=12.0 2024-08-19 20:09:58,180 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 20:09:59,382 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 6950, loss[loss=0.1094, beats_loss=0.00997, ecapa_loss=0.0001256, whisper_loss=0.09813, over 18929.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01034, ecapa_loss=0.0001404, whisper_loss=0.09107, over 3871793.26 frames. ], batch size: 74, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:10:06,643 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 22 from LS+wenet, 12 from Vox, 41 fro AS 2024-08-19 20:10:06,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4515390.0, ans=0.1 2024-08-19 20:10:13,104 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.99 vs. limit=22.5 2024-08-19 20:10:15,843 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 29 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-19 20:10:17,531 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 21 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 20:10:28,750 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 15 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-19 20:10:28,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4515490.0, ans=0.1 2024-08-19 20:10:56,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4515690.0, ans=0.125 2024-08-19 20:11:07,057 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 24 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-19 20:11:19,788 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 7000, loss[loss=0.1081, beats_loss=0.008319, ecapa_loss=0.0001587, whisper_loss=0.09821, over 14442.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01027, ecapa_loss=0.0001406, whisper_loss=0.09092, over 3793557.52 frames. ], batch size: 57, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:11:21,889 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 24 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 20:11:25,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4515890.0, ans=0.1 2024-08-19 20:11:27,437 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.98 vs. limit=22.5 2024-08-19 20:11:45,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4515990.0, ans=0.125 2024-08-19 20:11:56,413 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 21 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-19 20:12:03,010 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 24 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-19 20:12:03,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4516090.0, ans=0.125 2024-08-19 20:12:26,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4516290.0, ans=0.0 2024-08-19 20:12:32,048 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.294e+01 2.487e+01 2.816e+01 5.941e+01, threshold=4.975e+01, percent-clipped=1.0 2024-08-19 20:12:38,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4516290.0, ans=0.1 2024-08-19 20:12:41,754 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 7050, loss[loss=0.08487, beats_loss=0.009758, ecapa_loss=0.0001483, whisper_loss=0.07363, over 16943.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01033, ecapa_loss=0.00014, whisper_loss=0.0905, over 3798782.59 frames. ], batch size: 69, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:12:55,457 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=15.0 2024-08-19 20:12:57,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4516390.0, ans=0.1 2024-08-19 20:12:59,131 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.67 vs. limit=15.0 2024-08-19 20:13:04,486 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 18 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-19 20:13:14,800 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 26 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-19 20:13:41,316 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-08-19 20:13:44,406 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 20:13:45,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4516690.0, ans=0.125 2024-08-19 20:14:03,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4516790.0, ans=0.2 2024-08-19 20:14:08,905 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 7100, loss[loss=0.1059, beats_loss=0.00865, ecapa_loss=0.0001493, whisper_loss=0.09577, over 19649.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01044, ecapa_loss=0.0001399, whisper_loss=0.08985, over 3795047.85 frames. ], batch size: 76, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:14:12,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4516890.0, ans=0.2 2024-08-19 20:14:44,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=4517090.0, ans=0.5 2024-08-19 20:14:44,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4517090.0, ans=0.125 2024-08-19 20:15:13,634 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 20:15:16,546 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 20 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-19 20:15:21,427 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.242e+01 2.446e+01 2.720e+01 3.661e+01, threshold=4.892e+01, percent-clipped=0.0 2024-08-19 20:15:25,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4517290.0, ans=0.125 2024-08-19 20:15:27,151 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.53 vs. limit=22.5 2024-08-19 20:15:30,476 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 34 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-19 20:15:31,632 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 7150, loss[loss=0.1244, beats_loss=0.008696, ecapa_loss=0.0001538, whisper_loss=0.1142, over 23712.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01038, ecapa_loss=0.0001406, whisper_loss=0.09087, over 3835674.54 frames. ], batch size: 92, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:15:33,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4517390.0, ans=0.125 2024-08-19 20:15:38,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4517390.0, ans=0.125 2024-08-19 20:15:57,908 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=12.0 2024-08-19 20:15:58,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4517490.0, ans=0.1 2024-08-19 20:16:21,924 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 14 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 20:16:29,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4517690.0, ans=0.1 2024-08-19 20:16:52,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4517790.0, ans=0.1 2024-08-19 20:16:55,018 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 7200, loss[loss=0.1008, beats_loss=0.009284, ecapa_loss=0.000171, whisper_loss=0.08985, over 16463.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01046, ecapa_loss=0.0001408, whisper_loss=0.09044, over 3822688.28 frames. ], batch size: 68, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:17:21,919 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.42 vs. limit=15.0 2024-08-19 20:17:27,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4518090.0, ans=0.0 2024-08-19 20:17:29,311 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 15 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-19 20:17:38,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4518090.0, ans=0.0 2024-08-19 20:17:40,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4518090.0, ans=0.1 2024-08-19 20:18:01,863 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 13 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 20:18:04,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4518290.0, ans=0.125 2024-08-19 20:18:08,008 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.204e+01 2.383e+01 2.664e+01 1.113e+02, threshold=4.766e+01, percent-clipped=1.0 2024-08-19 20:18:18,050 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 7250, loss[loss=0.1142, beats_loss=0.009011, ecapa_loss=0.0001705, whisper_loss=0.1035, over 23537.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01041, ecapa_loss=0.0001415, whisper_loss=0.09042, over 3836046.67 frames. ], batch size: 95, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:18:22,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4518390.0, ans=10.0 2024-08-19 20:18:26,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4518390.0, ans=0.025 2024-08-19 20:18:34,967 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 24 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-19 20:18:38,465 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.98 vs. limit=15.0 2024-08-19 20:18:44,621 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 20 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-19 20:18:49,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4518590.0, ans=0.125 2024-08-19 20:19:22,869 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 19 from LS+wenet, 25 from Vox, 48 fro AS 2024-08-19 20:19:24,591 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 23 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-19 20:19:39,699 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 7300, loss[loss=0.1015, beats_loss=0.01017, ecapa_loss=0.0001461, whisper_loss=0.08982, over 22941.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01039, ecapa_loss=0.0001415, whisper_loss=0.09046, over 3863319.42 frames. ], batch size: 93, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:19:43,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4518890.0, ans=0.125 2024-08-19 20:19:44,625 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 26 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-19 20:19:49,319 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 19 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-19 20:20:07,959 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.72 vs. limit=12.0 2024-08-19 20:20:10,649 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-19 20:20:23,266 WARNING [optim.py:496] (0/4) Scaling gradients by 0.09269597381353378, model_norm_threshold=47.66118240356445 2024-08-19 20:20:23,423 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.810e+04, grad_sumsq=3.810e+04, orig_rms_sq=1.000e+00 2024-08-19 20:20:25,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4519090.0, ans=0.1 2024-08-19 20:20:34,082 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 20:20:48,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4519290.0, ans=0.2 2024-08-19 20:20:53,575 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.657e+01 2.360e+01 2.664e+01 3.085e+01 5.142e+02, threshold=5.329e+01, percent-clipped=3.0 2024-08-19 20:20:53,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=4519290.0, ans=0.05 2024-08-19 20:21:04,144 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 7350, loss[loss=0.07639, beats_loss=0.0122, ecapa_loss=0.0001566, whisper_loss=0.06262, over 15126.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01037, ecapa_loss=0.0001418, whisper_loss=0.09032, over 3844307.69 frames. ], batch size: 63, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:21:04,388 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-19 20:21:08,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4519390.0, ans=0.1 2024-08-19 20:21:18,088 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.83 vs. limit=5.0 2024-08-19 20:21:24,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4519490.0, ans=0.0 2024-08-19 20:21:28,436 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.07 vs. limit=15.0 2024-08-19 20:21:35,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4519490.0, ans=0.125 2024-08-19 20:21:43,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4519590.0, ans=0.125 2024-08-19 20:21:52,249 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 27 from LS+wenet, 26 from Vox, 22 fro AS 2024-08-19 20:21:54,615 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.92 vs. limit=22.5 2024-08-19 20:21:59,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4519690.0, ans=0.035 2024-08-19 20:22:12,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4519690.0, ans=0.0 2024-08-19 20:22:19,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4519790.0, ans=0.2 2024-08-19 20:22:35,260 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 7400, loss[loss=0.07664, beats_loss=0.0136, ecapa_loss=0.0001156, whisper_loss=0.06188, over 13133.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01039, ecapa_loss=0.0001414, whisper_loss=0.08971, over 3828525.38 frames. ], batch size: 53, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:22:47,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=4519890.0, ans=0.5 2024-08-19 20:22:52,126 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-452000.pt 2024-08-19 20:22:54,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=4519990.0, ans=0.1 2024-08-19 20:23:02,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4519990.0, ans=0.2 2024-08-19 20:23:10,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4520090.0, ans=0.125 2024-08-19 20:23:14,801 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 22 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 20:23:15,246 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2024-08-19 20:23:22,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4520090.0, ans=0.125 2024-08-19 20:23:53,768 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.575e+01 2.289e+01 2.494e+01 2.789e+01 3.959e+01, threshold=4.988e+01, percent-clipped=0.0 2024-08-19 20:24:04,771 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 7450, loss[loss=0.1154, beats_loss=0.008596, ecapa_loss=0.0001503, whisper_loss=0.1053, over 19394.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01035, ecapa_loss=0.000143, whisper_loss=0.09023, over 3821971.46 frames. ], batch size: 77, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:24:08,661 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 19 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 20:24:19,996 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2024-08-19 20:24:23,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4520490.0, ans=0.09899494936611666 2024-08-19 20:24:27,901 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 19 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 20:24:30,446 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.73 vs. limit=15.0 2024-08-19 20:24:31,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4520490.0, ans=0.1 2024-08-19 20:24:36,916 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 20:25:02,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4520690.0, ans=0.0 2024-08-19 20:25:03,291 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 18 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-19 20:25:10,985 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 31 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-19 20:25:11,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4520690.0, ans=0.125 2024-08-19 20:25:16,145 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 29 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-19 20:25:23,276 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 19 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 20:25:35,132 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 7500, loss[loss=0.09894, beats_loss=0.01062, ecapa_loss=0.0001195, whisper_loss=0.08712, over 22906.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01037, ecapa_loss=0.0001434, whisper_loss=0.09016, over 3816440.16 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:25:39,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4520890.0, ans=0.1 2024-08-19 20:26:07,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4520990.0, ans=0.0 2024-08-19 20:26:07,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4520990.0, ans=0.125 2024-08-19 20:26:12,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4521090.0, ans=0.125 2024-08-19 20:26:16,331 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 26 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-19 20:26:27,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4521090.0, ans=0.0 2024-08-19 20:26:31,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4521190.0, ans=0.1 2024-08-19 20:26:42,358 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.23 vs. limit=6.0 2024-08-19 20:26:45,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4521190.0, ans=0.125 2024-08-19 20:26:55,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4521290.0, ans=0.0 2024-08-19 20:26:56,285 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.301e+01 2.566e+01 2.939e+01 6.434e+01, threshold=5.131e+01, percent-clipped=1.0 2024-08-19 20:27:06,798 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 7550, loss[loss=0.1001, beats_loss=0.008674, ecapa_loss=0.0001765, whisper_loss=0.08964, over 15421.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01039, ecapa_loss=0.0001433, whisper_loss=0.08988, over 3810641.54 frames. ], batch size: 65, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:27:07,033 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 18 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-19 20:27:26,868 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2024-08-19 20:27:30,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4521490.0, ans=0.1 2024-08-19 20:27:38,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4521490.0, ans=0.125 2024-08-19 20:27:44,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4521590.0, ans=0.125 2024-08-19 20:27:48,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4521590.0, ans=0.0 2024-08-19 20:28:05,712 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 15 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-19 20:28:37,574 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 20:28:37,898 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.28 vs. limit=15.0 2024-08-19 20:28:40,256 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 7600, loss[loss=0.1028, beats_loss=0.01081, ecapa_loss=0.0001557, whisper_loss=0.09046, over 21926.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01038, ecapa_loss=0.0001436, whisper_loss=0.08997, over 3823682.57 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:28:42,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4521890.0, ans=0.125 2024-08-19 20:29:15,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4521990.0, ans=0.125 2024-08-19 20:29:32,009 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 25 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-19 20:29:41,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4522190.0, ans=0.0 2024-08-19 20:29:55,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4522290.0, ans=0.1 2024-08-19 20:30:04,591 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.242e+01 2.534e+01 2.867e+01 1.676e+03, threshold=5.067e+01, percent-clipped=0.0 2024-08-19 20:30:04,591 WARNING [optim.py:496] (0/4) Scaling gradients by 0.030242323875427246, model_norm_threshold=50.67152404785156 2024-08-19 20:30:04,747 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.28, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.977e+05, grad_sumsq=7.564e+07, orig_rms_sq=1.055e-02 2024-08-19 20:30:12,132 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 24 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-19 20:30:15,787 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 7650, loss[loss=0.1042, beats_loss=0.007673, ecapa_loss=0.0001447, whisper_loss=0.09503, over 12954.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01041, ecapa_loss=0.0001431, whisper_loss=0.09071, over 3816547.36 frames. ], batch size: 50, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:30:28,803 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 34 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 20:30:29,330 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.46 vs. limit=10.0 2024-08-19 20:30:38,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4522490.0, ans=0.125 2024-08-19 20:30:50,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4522590.0, ans=0.0 2024-08-19 20:30:52,484 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 23 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-19 20:30:52,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4522590.0, ans=0.1 2024-08-19 20:31:01,458 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 26 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 20:31:46,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4522790.0, ans=0.0 2024-08-19 20:31:50,080 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 7700, loss[loss=0.1327, beats_loss=0.007355, ecapa_loss=0.0001293, whisper_loss=0.124, over 19202.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01041, ecapa_loss=0.0001426, whisper_loss=0.09039, over 3807944.36 frames. ], batch size: 70, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 20:31:58,133 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 20 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-19 20:32:31,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4523090.0, ans=0.125 2024-08-19 20:33:10,213 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.323e+01 2.517e+01 2.796e+01 4.474e+01, threshold=5.035e+01, percent-clipped=1.0 2024-08-19 20:33:19,415 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 7750, loss[loss=0.09347, beats_loss=0.009128, ecapa_loss=0.0001245, whisper_loss=0.08309, over 12885.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0104, ecapa_loss=0.0001412, whisper_loss=0.09031, over 3781569.62 frames. ], batch size: 50, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:33:33,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4523390.0, ans=0.2 2024-08-19 20:33:36,191 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-08-19 20:33:42,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4523490.0, ans=0.2 2024-08-19 20:33:49,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4523490.0, ans=0.2 2024-08-19 20:33:53,613 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 18 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 20:34:02,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4523590.0, ans=0.125 2024-08-19 20:34:05,602 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 31 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-19 20:34:05,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4523590.0, ans=0.0 2024-08-19 20:34:07,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4523590.0, ans=0.04949747468305833 2024-08-19 20:34:31,934 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 31 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-19 20:34:37,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4523790.0, ans=0.015 2024-08-19 20:34:50,101 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 7800, loss[loss=0.08502, beats_loss=0.01192, ecapa_loss=0.0001675, whisper_loss=0.07143, over 15061.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.0001415, whisper_loss=0.09036, over 3818691.15 frames. ], batch size: 65, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:34:58,650 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 31 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-19 20:35:21,630 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 34 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-19 20:35:25,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4524090.0, ans=0.125 2024-08-19 20:35:27,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4524090.0, ans=0.1 2024-08-19 20:35:27,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4524090.0, ans=0.125 2024-08-19 20:35:39,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4524090.0, ans=0.125 2024-08-19 20:35:40,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4524190.0, ans=0.125 2024-08-19 20:36:09,155 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.228e+01 2.464e+01 2.830e+01 4.593e+01, threshold=4.929e+01, percent-clipped=0.0 2024-08-19 20:36:18,153 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 7850, loss[loss=0.08217, beats_loss=0.01126, ecapa_loss=0.0001368, whisper_loss=0.06954, over 18222.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01033, ecapa_loss=0.0001415, whisper_loss=0.09006, over 3796776.42 frames. ], batch size: 72, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:36:21,809 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 16 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 20:36:25,175 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 20 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-19 20:36:36,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4524490.0, ans=0.0 2024-08-19 20:36:43,358 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 29 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-19 20:36:50,981 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.36 vs. limit=15.0 2024-08-19 20:36:56,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4524590.0, ans=0.0 2024-08-19 20:37:27,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4524790.0, ans=0.09899494936611666 2024-08-19 20:37:46,788 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 7900, loss[loss=0.09623, beats_loss=0.01004, ecapa_loss=0.0001659, whisper_loss=0.08453, over 20781.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01031, ecapa_loss=0.0001416, whisper_loss=0.09035, over 3811988.57 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:37:50,424 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 14 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-19 20:38:06,073 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 31 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-19 20:38:30,646 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-19 20:38:52,139 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 24 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-19 20:39:06,769 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.308e+01 2.634e+01 2.974e+01 4.173e+01, threshold=5.267e+01, percent-clipped=0.0 2024-08-19 20:39:15,489 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 7950, loss[loss=0.09981, beats_loss=0.0102, ecapa_loss=0.0001588, whisper_loss=0.08802, over 21473.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0104, ecapa_loss=0.0001401, whisper_loss=0.09006, over 3862773.04 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:39:24,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4525390.0, ans=0.125 2024-08-19 20:39:31,519 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 18 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 20:39:44,499 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.78 vs. limit=12.0 2024-08-19 20:39:47,063 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 20:39:49,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4525590.0, ans=0.5 2024-08-19 20:40:00,502 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.52 vs. limit=10.0 2024-08-19 20:40:04,005 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2024-08-19 20:40:10,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4525690.0, ans=0.2 2024-08-19 20:40:11,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4525690.0, ans=0.1 2024-08-19 20:40:20,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4525690.0, ans=0.125 2024-08-19 20:40:25,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4525790.0, ans=0.125 2024-08-19 20:40:29,061 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=12.0 2024-08-19 20:40:35,252 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 30 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-19 20:40:42,438 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 8000, loss[loss=0.07917, beats_loss=0.01182, ecapa_loss=0.0001278, whisper_loss=0.06608, over 14415.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01044, ecapa_loss=0.0001405, whisper_loss=0.08984, over 3847836.80 frames. ], batch size: 54, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:41:46,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4526190.0, ans=0.0 2024-08-19 20:41:59,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4526290.0, ans=0.1 2024-08-19 20:42:05,245 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.388e+01 2.587e+01 2.894e+01 4.259e+01, threshold=5.174e+01, percent-clipped=0.0 2024-08-19 20:42:06,001 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.10 vs. limit=15.0 2024-08-19 20:42:15,376 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 8050, loss[loss=0.07256, beats_loss=0.01262, ecapa_loss=0.0001084, whisper_loss=0.05885, over 13068.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0104, ecapa_loss=0.0001418, whisper_loss=0.09019, over 3849074.25 frames. ], batch size: 51, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:42:17,561 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 35 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 20:42:21,204 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 27 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-19 20:42:22,636 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 38 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 20:42:27,689 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 15 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 20:42:28,992 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 18 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-19 20:42:36,030 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 11 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 20:43:21,638 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.02 vs. limit=12.0 2024-08-19 20:43:30,328 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.80 vs. limit=15.0 2024-08-19 20:43:34,000 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 22 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-19 20:43:34,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4526790.0, ans=0.125 2024-08-19 20:43:49,409 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 8100, loss[loss=0.08269, beats_loss=0.01218, ecapa_loss=0.0001494, whisper_loss=0.06901, over 21405.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01043, ecapa_loss=0.0001405, whisper_loss=0.0898, over 3815409.55 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:44:24,827 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.05 vs. limit=22.5 2024-08-19 20:44:29,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4527090.0, ans=0.125 2024-08-19 20:44:50,399 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 28 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-19 20:45:12,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4527290.0, ans=0.125 2024-08-19 20:45:17,627 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=23.23 vs. limit=22.5 2024-08-19 20:45:20,506 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.340e+01 2.531e+01 2.954e+01 4.685e+01, threshold=5.063e+01, percent-clipped=0.0 2024-08-19 20:45:23,115 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 26 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-19 20:45:23,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4527290.0, ans=0.0 2024-08-19 20:45:30,750 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 8150, loss[loss=0.08277, beats_loss=0.01152, ecapa_loss=0.0001422, whisper_loss=0.06983, over 19008.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01039, ecapa_loss=0.0001408, whisper_loss=0.09019, over 3787362.66 frames. ], batch size: 80, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:45:30,979 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 20 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 20:45:33,491 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 36 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-19 20:45:39,456 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 25 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 20:45:49,925 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 18 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-19 20:46:14,161 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.29 vs. limit=15.0 2024-08-19 20:46:31,697 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 17 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 20:46:40,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4527690.0, ans=0.125 2024-08-19 20:46:48,420 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.98 vs. limit=10.0 2024-08-19 20:46:58,864 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 20:47:07,990 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 8200, loss[loss=0.09636, beats_loss=0.01175, ecapa_loss=9.805e-05, whisper_loss=0.08363, over 21549.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01043, ecapa_loss=0.0001403, whisper_loss=0.08973, over 3789728.60 frames. ], batch size: 80, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:47:12,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4527890.0, ans=0.1 2024-08-19 20:47:18,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4527890.0, ans=0.1 2024-08-19 20:47:38,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4527990.0, ans=0.125 2024-08-19 20:47:40,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4527990.0, ans=0.125 2024-08-19 20:47:42,853 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2024-08-19 20:47:46,142 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-19 20:47:49,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4528090.0, ans=0.125 2024-08-19 20:47:57,146 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 24 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 20:48:23,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4528290.0, ans=0.04949747468305833 2024-08-19 20:48:30,010 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.02 vs. limit=10.0 2024-08-19 20:48:35,365 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.270e+01 2.400e+01 2.604e+01 4.192e+01, threshold=4.800e+01, percent-clipped=0.0 2024-08-19 20:48:35,603 INFO [train_multi_KD3.py:845] (0/4) A total of 96 cuts. 31 from LS+wenet, 16 from Vox, 49 fro AS 2024-08-19 20:48:44,749 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 8250, loss[loss=0.08665, beats_loss=0.01122, ecapa_loss=0.00014, whisper_loss=0.07403, over 18064.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01056, ecapa_loss=0.0001412, whisper_loss=0.08885, over 3818060.72 frames. ], batch size: 74, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:49:07,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4528490.0, ans=0.125 2024-08-19 20:49:11,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4528490.0, ans=0.2 2024-08-19 20:49:42,452 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 22 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-19 20:50:09,882 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 15 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-19 20:50:12,647 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 22 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-19 20:50:13,266 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.14 vs. limit=15.0 2024-08-19 20:50:19,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4528890.0, ans=0.2 2024-08-19 20:50:20,490 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 8300, loss[loss=0.1104, beats_loss=0.01084, ecapa_loss=9.218e-05, whisper_loss=0.09863, over 17332.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01058, ecapa_loss=0.0001407, whisper_loss=0.08862, over 3796125.90 frames. ], batch size: 65, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:50:40,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4528990.0, ans=0.1 2024-08-19 20:50:50,732 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.24 vs. limit=22.5 2024-08-19 20:50:57,036 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 29 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-19 20:51:42,723 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.312e+01 2.525e+01 2.734e+01 6.042e+01, threshold=5.050e+01, percent-clipped=1.0 2024-08-19 20:51:51,661 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 8350, loss[loss=0.09015, beats_loss=0.01248, ecapa_loss=0.0001174, whisper_loss=0.07649, over 17615.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01058, ecapa_loss=0.0001407, whisper_loss=0.0882, over 3779270.55 frames. ], batch size: 70, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:52:13,276 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=15.0 2024-08-19 20:52:30,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4529590.0, ans=0.125 2024-08-19 20:52:39,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4529590.0, ans=0.125 2024-08-19 20:52:45,944 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=15.0 2024-08-19 20:53:01,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4529690.0, ans=0.1 2024-08-19 20:53:07,411 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.34 vs. limit=15.0 2024-08-19 20:53:29,281 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 8400, loss[loss=0.09916, beats_loss=0.01169, ecapa_loss=0.0001117, whisper_loss=0.08636, over 22332.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01052, ecapa_loss=0.0001411, whisper_loss=0.08885, over 3812474.08 frames. ], batch size: 88, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:53:29,531 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 35 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-19 20:53:29,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4529890.0, ans=0.125 2024-08-19 20:53:30,891 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 22 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-19 20:53:36,616 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 20:53:43,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4529890.0, ans=0.5 2024-08-19 20:53:47,005 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.77 vs. limit=10.0 2024-08-19 20:53:53,351 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 29 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-19 20:53:53,836 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.38 vs. limit=10.0 2024-08-19 20:53:56,653 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 29 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-19 20:54:00,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4529990.0, ans=0.125 2024-08-19 20:54:13,511 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 23 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 20:54:22,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4530190.0, ans=0.0 2024-08-19 20:54:29,924 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 21 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-19 20:54:30,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4530190.0, ans=0.035 2024-08-19 20:54:31,716 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 23 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-19 20:54:48,931 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.271e+01 2.577e+01 2.838e+01 4.178e+01, threshold=5.155e+01, percent-clipped=0.0 2024-08-19 20:54:59,405 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 8450, loss[loss=0.1167, beats_loss=0.008567, ecapa_loss=0.0001527, whisper_loss=0.1066, over 22626.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01039, ecapa_loss=0.0001411, whisper_loss=0.08974, over 3802624.60 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:54:59,640 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 23 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 20:55:28,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4530490.0, ans=0.1 2024-08-19 20:55:39,923 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.07 vs. limit=15.0 2024-08-19 20:55:58,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4530690.0, ans=0.125 2024-08-19 20:56:05,235 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 30 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 20:56:10,837 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-19 20:56:23,564 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 20:56:35,837 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 15 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 20:56:40,126 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 8500, loss[loss=0.1044, beats_loss=0.01091, ecapa_loss=0.0001374, whisper_loss=0.09216, over 17270.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01043, ecapa_loss=0.0001411, whisper_loss=0.09, over 3815811.08 frames. ], batch size: 71, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:56:46,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4530890.0, ans=0.0 2024-08-19 20:56:58,580 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 27 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-19 20:57:03,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4530990.0, ans=0.125 2024-08-19 20:57:11,076 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 20:57:15,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4530990.0, ans=0.0 2024-08-19 20:57:34,780 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.45 vs. limit=12.0 2024-08-19 20:57:36,256 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=15.0 2024-08-19 20:57:37,192 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 22 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-19 20:57:54,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4531190.0, ans=0.0 2024-08-19 20:58:13,365 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.342e+01 2.609e+01 2.850e+01 2.704e+02, threshold=5.218e+01, percent-clipped=2.0 2024-08-19 20:58:21,755 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 12 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-19 20:58:23,654 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 8550, loss[loss=0.07545, beats_loss=0.01041, ecapa_loss=0.0001495, whisper_loss=0.06354, over 13701.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01043, ecapa_loss=0.0001415, whisper_loss=0.09033, over 3856927.66 frames. ], batch size: 56, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 20:58:36,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4531390.0, ans=0.125 2024-08-19 20:58:36,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4531390.0, ans=0.0 2024-08-19 20:59:02,321 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 18 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 20:59:02,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4531590.0, ans=0.0 2024-08-19 20:59:08,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4531590.0, ans=0.0 2024-08-19 20:59:21,391 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 31 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-19 20:59:30,757 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-08-19 20:59:34,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4531690.0, ans=0.0 2024-08-19 20:59:43,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4531790.0, ans=0.0 2024-08-19 20:59:59,581 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 8600, loss[loss=0.08829, beats_loss=0.0109, ecapa_loss=0.0001254, whisper_loss=0.07614, over 14384.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01044, ecapa_loss=0.0001405, whisper_loss=0.09011, over 3848636.22 frames. ], batch size: 56, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:00:01,409 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 37 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 21:00:27,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4531990.0, ans=0.2 2024-08-19 21:00:39,173 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.50 vs. limit=15.0 2024-08-19 21:00:41,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4532090.0, ans=0.1 2024-08-19 21:00:50,665 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 24 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-19 21:01:00,999 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 23 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 21:01:03,287 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 16 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 21:01:30,129 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.732e+01 2.352e+01 2.546e+01 2.927e+01 4.092e+01, threshold=5.093e+01, percent-clipped=0.0 2024-08-19 21:01:37,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4532390.0, ans=0.125 2024-08-19 21:01:39,168 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 8650, loss[loss=0.112, beats_loss=0.009425, ecapa_loss=0.0001567, whisper_loss=0.101, over 23313.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01042, ecapa_loss=0.0001405, whisper_loss=0.09026, over 3837788.10 frames. ], batch size: 93, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:01:48,459 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 21:01:48,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4532390.0, ans=0.2 2024-08-19 21:01:52,576 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.65 vs. limit=15.0 2024-08-19 21:02:04,014 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 24 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-19 21:02:14,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4532490.0, ans=0.125 2024-08-19 21:02:16,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4532490.0, ans=0.015 2024-08-19 21:02:16,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4532490.0, ans=0.0 2024-08-19 21:02:22,461 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-19 21:02:35,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4532590.0, ans=0.1 2024-08-19 21:02:44,428 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 19 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-19 21:03:02,030 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-19 21:03:08,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4532790.0, ans=0.125 2024-08-19 21:03:11,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4532790.0, ans=0.1 2024-08-19 21:03:14,436 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 8700, loss[loss=0.08657, beats_loss=0.01193, ecapa_loss=0.0001337, whisper_loss=0.0733, over 15920.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01041, ecapa_loss=0.0001407, whisper_loss=0.09003, over 3822275.62 frames. ], batch size: 64, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:03:18,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4532890.0, ans=0.125 2024-08-19 21:03:33,670 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 27 from LS+wenet, 34 from Vox, 31 fro AS 2024-08-19 21:03:39,677 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.22 vs. limit=15.0 2024-08-19 21:03:48,588 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.65 vs. limit=15.0 2024-08-19 21:03:57,629 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 29 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-19 21:04:19,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4533190.0, ans=0.2 2024-08-19 21:04:34,482 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.263e+01 2.457e+01 2.766e+01 3.922e+01, threshold=4.914e+01, percent-clipped=0.0 2024-08-19 21:04:43,640 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 8750, loss[loss=0.1137, beats_loss=0.01046, ecapa_loss=0.0001276, whisper_loss=0.102, over 18561.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01044, ecapa_loss=0.0001408, whisper_loss=0.08915, over 3783934.04 frames. ], batch size: 73, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:04:43,877 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 22 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 21:05:14,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4533490.0, ans=0.125 2024-08-19 21:05:18,931 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 18 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-19 21:05:22,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4533590.0, ans=0.2 2024-08-19 21:05:25,968 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 20 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-19 21:06:04,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4533790.0, ans=0.0 2024-08-19 21:06:10,554 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.34 vs. limit=15.0 2024-08-19 21:06:16,981 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 8800, loss[loss=0.1074, beats_loss=0.01089, ecapa_loss=0.0001185, whisper_loss=0.0953, over 17910.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01044, ecapa_loss=0.000141, whisper_loss=0.08903, over 3781041.54 frames. ], batch size: 69, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:06:17,481 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 24 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-19 21:06:21,106 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 15 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-19 21:06:40,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4533990.0, ans=0.09899494936611666 2024-08-19 21:06:59,094 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 24 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-19 21:07:05,670 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 24 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-19 21:07:09,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4534190.0, ans=0.125 2024-08-19 21:07:19,135 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-19 21:07:33,672 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.239e+01 2.468e+01 2.713e+01 3.674e+01, threshold=4.936e+01, percent-clipped=0.0 2024-08-19 21:07:34,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4534290.0, ans=0.125 2024-08-19 21:07:35,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4534290.0, ans=0.1 2024-08-19 21:07:42,247 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 8850, loss[loss=0.1056, beats_loss=0.009642, ecapa_loss=0.000148, whisper_loss=0.09446, over 22698.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01041, ecapa_loss=0.0001416, whisper_loss=0.08874, over 3731839.70 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:07:44,338 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 25 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-19 21:07:59,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4534490.0, ans=0.0 2024-08-19 21:08:09,372 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 24 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-19 21:08:10,597 WARNING [optim.py:496] (0/4) Scaling gradients by 0.05975715443491936, model_norm_threshold=49.35716247558594 2024-08-19 21:08:10,753 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.conv_module1.depthwise_conv.causal_conv.weight with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.671e+04, grad_sumsq=1.079e+05, orig_rms_sq=6.184e-01 2024-08-19 21:08:15,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4534590.0, ans=0.0 2024-08-19 21:08:20,848 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-08-19 21:08:23,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4534590.0, ans=0.125 2024-08-19 21:08:25,280 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 20 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-19 21:08:25,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4534590.0, ans=0.2 2024-08-19 21:08:28,002 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.18 vs. limit=12.0 2024-08-19 21:08:44,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4534690.0, ans=0.125 2024-08-19 21:08:52,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4534790.0, ans=0.1 2024-08-19 21:09:03,743 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 8900, loss[loss=0.09004, beats_loss=0.01026, ecapa_loss=0.0001453, whisper_loss=0.07833, over 16790.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01051, ecapa_loss=0.0001417, whisper_loss=0.08819, over 3748864.77 frames. ], batch size: 67, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:09:07,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4534890.0, ans=0.125 2024-08-19 21:09:10,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4534890.0, ans=0.0 2024-08-19 21:09:13,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4534890.0, ans=0.1 2024-08-19 21:09:27,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4534990.0, ans=0.125 2024-08-19 21:09:33,496 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.31 vs. limit=22.5 2024-08-19 21:09:54,888 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 39 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 21:10:07,635 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.63 vs. limit=12.0 2024-08-19 21:10:14,502 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.373e+01 2.651e+01 2.938e+01 8.260e+02, threshold=5.301e+01, percent-clipped=3.0 2024-08-19 21:10:16,778 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 27 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-19 21:10:22,801 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 8950, loss[loss=0.106, beats_loss=0.00876, ecapa_loss=0.0001443, whisper_loss=0.09576, over 21219.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01047, ecapa_loss=0.0001415, whisper_loss=0.08862, over 3747989.55 frames. ], batch size: 84, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:10:30,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4535390.0, ans=0.04949747468305833 2024-08-19 21:10:30,609 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.90 vs. limit=15.0 2024-08-19 21:10:35,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4535390.0, ans=0.0 2024-08-19 21:10:43,005 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.00 vs. limit=15.0 2024-08-19 21:10:57,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4535590.0, ans=0.1 2024-08-19 21:10:59,269 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 16 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-19 21:11:15,584 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 29 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 21:11:17,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4535690.0, ans=0.1 2024-08-19 21:11:19,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4535690.0, ans=0.2 2024-08-19 21:11:22,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4535690.0, ans=0.125 2024-08-19 21:11:28,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4535690.0, ans=0.125 2024-08-19 21:11:36,473 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 24 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 21:11:40,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4535790.0, ans=0.0 2024-08-19 21:11:42,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4535790.0, ans=0.1 2024-08-19 21:11:50,876 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 9000, loss[loss=0.09304, beats_loss=0.01147, ecapa_loss=0.0001521, whisper_loss=0.08005, over 14208.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01048, ecapa_loss=0.000142, whisper_loss=0.08867, over 3783710.61 frames. ], batch size: 59, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:11:50,877 INFO [train_multi_KD3.py:1140] (0/4) Computing validation loss 2024-08-19 21:12:02,638 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.4316, 1.9975, 4.4556, 4.8026], device='cuda:0') 2024-08-19 21:12:26,907 INFO [train_multi_KD3.py:1150] (0/4) Epoch 31, validation on ASR_libri: loss=0.2531, beats_loss=0, ecapa_loss=0.0005115, whisper_loss=0.248, over 931116.00 frames. 2024-08-19 21:12:49,663 INFO [train_multi_KD3.py:1150] (0/4) Epoch 31, validation on SV_voxceleb1: loss=0.003978, beats_loss=0, ecapa_loss=0.0003978, whisper_loss=0, over 944235.00 frames. 2024-08-19 21:13:40,538 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.2381, 1.9231, 2.0873, 1.8934], device='cuda:0') 2024-08-19 21:14:15,975 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.7562, 2.0290, 2.5111, 1.1868], device='cuda:0') 2024-08-19 21:14:27,785 INFO [train_multi_KD3.py:1150] (0/4) Epoch 31, validation on AT_audioset: loss=0.02297, beats_loss=0.02297, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 21:14:27,789 INFO [train_multi_KD3.py:1156] (0/4) Maximum memory allocated so far is 32775MB 2024-08-19 21:14:41,052 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 15 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 21:14:45,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4535990.0, ans=0.125 2024-08-19 21:15:00,391 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 23 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-19 21:15:52,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4536290.0, ans=0.0 2024-08-19 21:15:55,805 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 31 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-19 21:15:56,478 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.16 vs. limit=15.0 2024-08-19 21:16:01,549 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.332e+01 2.634e+01 2.859e+01 8.780e+01, threshold=5.268e+01, percent-clipped=1.0 2024-08-19 21:16:11,960 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 9050, loss[loss=0.1027, beats_loss=0.01019, ecapa_loss=0.0001436, whisper_loss=0.09108, over 18708.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01048, ecapa_loss=0.0001415, whisper_loss=0.08938, over 3790652.41 frames. ], batch size: 76, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:16:12,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4536390.0, ans=0.125 2024-08-19 21:16:22,124 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-19 21:16:41,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4536490.0, ans=0.125 2024-08-19 21:16:52,225 WARNING [optim.py:496] (0/4) Scaling gradients by 0.07672171294689178, model_norm_threshold=52.675148010253906 2024-08-19 21:16:52,382 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.1.norm.log_scale with proportion 0.26, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.217e+05, grad_sumsq=1.217e+05, orig_rms_sq=1.000e+00 2024-08-19 21:16:52,660 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 22 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-19 21:17:02,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4536590.0, ans=0.04949747468305833 2024-08-19 21:17:17,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4536690.0, ans=0.2 2024-08-19 21:17:19,333 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.90 vs. limit=15.0 2024-08-19 21:17:24,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4536690.0, ans=0.1 2024-08-19 21:17:30,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4536690.0, ans=0.125 2024-08-19 21:17:32,138 INFO [train_multi_KD3.py:845] (0/4) A total of 49 cuts. 20 from LS+wenet, 11 from Vox, 18 fro AS 2024-08-19 21:17:39,388 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.55 vs. limit=12.0 2024-08-19 21:17:44,234 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 38 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-19 21:17:52,592 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 9100, loss[loss=0.1107, beats_loss=0.009255, ecapa_loss=0.000134, whisper_loss=0.1001, over 14887.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01046, ecapa_loss=0.0001422, whisper_loss=0.08905, over 3804038.52 frames. ], batch size: 58, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:18:03,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4536890.0, ans=0.125 2024-08-19 21:18:09,230 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 17 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-19 21:18:14,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4536990.0, ans=0.1 2024-08-19 21:18:14,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4536990.0, ans=0.1 2024-08-19 21:18:22,394 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-19 21:18:24,065 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 22 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 21:18:50,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4537090.0, ans=0.125 2024-08-19 21:19:00,618 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.74 vs. limit=15.0 2024-08-19 21:19:09,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4537190.0, ans=0.1 2024-08-19 21:19:24,564 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.552e+01 2.187e+01 2.511e+01 2.857e+01 6.866e+02, threshold=5.022e+01, percent-clipped=2.0 2024-08-19 21:19:25,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4537290.0, ans=0.125 2024-08-19 21:19:25,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4537290.0, ans=0.0 2024-08-19 21:19:32,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4537390.0, ans=0.05 2024-08-19 21:19:34,258 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 9150, loss[loss=0.1209, beats_loss=0.009981, ecapa_loss=0.0001377, whisper_loss=0.1096, over 16574.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0105, ecapa_loss=0.0001415, whisper_loss=0.08971, over 3824010.88 frames. ], batch size: 64, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:19:50,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4537390.0, ans=10.0 2024-08-19 21:19:54,757 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=11.91 vs. limit=12.0 2024-08-19 21:19:58,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4537490.0, ans=0.125 2024-08-19 21:20:03,940 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 25 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 21:20:09,436 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-19 21:20:11,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=4537590.0, ans=0.5 2024-08-19 21:20:25,893 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 34 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-19 21:20:41,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4537690.0, ans=0.05 2024-08-19 21:20:43,055 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 21:21:09,827 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 9200, loss[loss=0.1005, beats_loss=0.00775, ecapa_loss=0.0001562, whisper_loss=0.09121, over 14894.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01048, ecapa_loss=0.0001416, whisper_loss=0.08953, over 3792241.25 frames. ], batch size: 59, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:21:12,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4537890.0, ans=0.0 2024-08-19 21:21:32,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4537990.0, ans=0.2 2024-08-19 21:21:37,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4537990.0, ans=0.0 2024-08-19 21:22:15,588 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 20 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-19 21:22:35,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4538290.0, ans=0.125 2024-08-19 21:22:39,567 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.332e+01 2.620e+01 2.969e+01 6.711e+01, threshold=5.240e+01, percent-clipped=2.0 2024-08-19 21:22:49,871 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 9250, loss[loss=0.1049, beats_loss=0.009761, ecapa_loss=0.0001237, whisper_loss=0.09386, over 17462.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01047, ecapa_loss=0.0001417, whisper_loss=0.09019, over 3805481.94 frames. ], batch size: 67, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:23:09,035 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.96 vs. limit=15.0 2024-08-19 21:23:13,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=4538490.0, ans=0.025 2024-08-19 21:23:18,900 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 22 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-19 21:23:26,678 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 21:23:33,600 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 17 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-19 21:23:37,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4538590.0, ans=0.125 2024-08-19 21:24:03,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4538690.0, ans=0.125 2024-08-19 21:24:05,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4538790.0, ans=0.0 2024-08-19 21:24:20,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4538790.0, ans=0.2 2024-08-19 21:24:25,335 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 9300, loss[loss=0.09448, beats_loss=0.01285, ecapa_loss=0.0001097, whisper_loss=0.08053, over 23377.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01048, ecapa_loss=0.0001411, whisper_loss=0.09035, over 3821480.86 frames. ], batch size: 93, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:24:57,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4538990.0, ans=0.0 2024-08-19 21:24:57,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4538990.0, ans=0.0 2024-08-19 21:25:12,219 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 25 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 21:25:28,701 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.48 vs. limit=22.5 2024-08-19 21:25:50,825 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.320e+01 2.550e+01 2.886e+01 6.142e+01, threshold=5.100e+01, percent-clipped=1.0 2024-08-19 21:25:53,673 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2024-08-19 21:25:56,910 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-08-19 21:26:00,078 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 9350, loss[loss=0.09139, beats_loss=0.0105, ecapa_loss=0.0001424, whisper_loss=0.07947, over 19439.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001397, whisper_loss=0.09014, over 3861213.80 frames. ], batch size: 80, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:26:02,198 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 17 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-19 21:26:15,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4539390.0, ans=0.05 2024-08-19 21:26:36,636 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 26 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-19 21:26:41,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4539590.0, ans=0.0 2024-08-19 21:26:52,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4539590.0, ans=0.125 2024-08-19 21:27:33,476 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 9400, loss[loss=0.1072, beats_loss=0.01092, ecapa_loss=0.000118, whisper_loss=0.09514, over 20047.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01058, ecapa_loss=0.0001395, whisper_loss=0.08967, over 3847565.62 frames. ], batch size: 77, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:27:49,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4539890.0, ans=0.1 2024-08-19 21:28:02,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=4539990.0, ans=10.0 2024-08-19 21:28:10,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4540090.0, ans=0.0 2024-08-19 21:28:34,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4540190.0, ans=0.1 2024-08-19 21:28:39,042 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.74 vs. limit=15.0 2024-08-19 21:28:57,131 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+01 2.227e+01 2.494e+01 2.747e+01 6.860e+01, threshold=4.987e+01, percent-clipped=1.0 2024-08-19 21:28:57,331 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 36 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-19 21:29:07,223 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 9450, loss[loss=0.1157, beats_loss=0.01073, ecapa_loss=0.0001227, whisper_loss=0.1038, over 22346.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01054, ecapa_loss=0.0001403, whisper_loss=0.0898, over 3876412.79 frames. ], batch size: 88, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:29:07,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4540390.0, ans=0.2 2024-08-19 21:29:30,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4540490.0, ans=0.125 2024-08-19 21:29:37,031 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 16 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-19 21:29:57,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4540590.0, ans=0.125 2024-08-19 21:30:03,426 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.50 vs. limit=15.0 2024-08-19 21:30:29,218 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.19 vs. limit=22.5 2024-08-19 21:30:34,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4540790.0, ans=0.125 2024-08-19 21:30:35,973 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 27 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-19 21:30:46,435 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 30 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 21:30:48,106 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 9500, loss[loss=0.1065, beats_loss=0.009908, ecapa_loss=0.0001763, whisper_loss=0.09482, over 22513.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01045, ecapa_loss=0.0001419, whisper_loss=0.09017, over 3871110.30 frames. ], batch size: 92, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:30:48,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4540890.0, ans=0.025 2024-08-19 21:31:01,251 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=15.05 vs. limit=15.0 2024-08-19 21:31:02,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4540890.0, ans=0.125 2024-08-19 21:31:10,244 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 18 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-19 21:31:31,831 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 21:31:38,031 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 19 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-19 21:31:54,115 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 34 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-19 21:31:55,580 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-19 21:31:58,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4541190.0, ans=0.04949747468305833 2024-08-19 21:32:15,586 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 32 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-19 21:32:17,488 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.722e+01 2.220e+01 2.482e+01 2.741e+01 4.040e+01, threshold=4.965e+01, percent-clipped=0.0 2024-08-19 21:32:27,362 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 9550, loss[loss=0.07736, beats_loss=0.01253, ecapa_loss=0.0001193, whisper_loss=0.06364, over 17918.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01042, ecapa_loss=0.0001423, whisper_loss=0.08982, over 3832229.13 frames. ], batch size: 73, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:32:41,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4541390.0, ans=0.0 2024-08-19 21:32:47,414 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-19 21:33:01,386 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 7 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-19 21:33:05,001 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-08-19 21:33:18,500 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 32 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-19 21:33:20,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4541590.0, ans=0.125 2024-08-19 21:33:29,390 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 18 from LS+wenet, 11 from Vox, 40 fro AS 2024-08-19 21:33:33,790 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 25 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-19 21:33:45,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4541790.0, ans=0.0 2024-08-19 21:34:02,971 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 9600, loss[loss=0.08887, beats_loss=0.0116, ecapa_loss=0.0001375, whisper_loss=0.07589, over 19935.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01039, ecapa_loss=0.0001428, whisper_loss=0.09009, over 3818218.99 frames. ], batch size: 80, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:34:11,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4541890.0, ans=0.0 2024-08-19 21:34:28,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4541990.0, ans=0.1 2024-08-19 21:34:28,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4541990.0, ans=0.1 2024-08-19 21:34:28,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4541990.0, ans=0.1 2024-08-19 21:34:47,243 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 17 from LS+wenet, 24 from Vox, 14 fro AS 2024-08-19 21:35:15,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4542190.0, ans=10.0 2024-08-19 21:35:16,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4542190.0, ans=0.125 2024-08-19 21:35:16,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4542190.0, ans=0.0 2024-08-19 21:35:18,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4542190.0, ans=0.05 2024-08-19 21:35:26,655 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 30 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-19 21:35:30,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4542290.0, ans=0.0 2024-08-19 21:35:34,827 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.267e+01 2.509e+01 2.746e+01 4.719e+01, threshold=5.019e+01, percent-clipped=0.0 2024-08-19 21:35:39,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4542290.0, ans=0.125 2024-08-19 21:35:45,885 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 9650, loss[loss=0.1017, beats_loss=0.01123, ecapa_loss=0.000129, whisper_loss=0.08918, over 13438.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01044, ecapa_loss=0.0001422, whisper_loss=0.08965, over 3814430.13 frames. ], batch size: 52, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:35:57,929 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 18 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-19 21:36:01,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4542390.0, ans=0.125 2024-08-19 21:36:01,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4542390.0, ans=0.1 2024-08-19 21:36:19,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4542490.0, ans=0.1 2024-08-19 21:36:26,840 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 31 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-19 21:36:34,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4542590.0, ans=0.125 2024-08-19 21:36:38,771 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.55 vs. limit=15.0 2024-08-19 21:36:40,596 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 17 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-19 21:36:45,029 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.97 vs. limit=15.0 2024-08-19 21:36:52,483 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 29 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-19 21:37:25,894 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 23 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 21:37:29,848 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 9700, loss[loss=0.08608, beats_loss=0.01153, ecapa_loss=0.0001255, whisper_loss=0.07329, over 17765.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01043, ecapa_loss=0.0001428, whisper_loss=0.08859, over 3753218.64 frames. ], batch size: 73, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 21:37:37,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4542890.0, ans=0.09899494936611666 2024-08-19 21:38:15,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4543090.0, ans=0.125 2024-08-19 21:38:27,705 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 13 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-19 21:38:55,165 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 22 from LS+wenet, 8 from Vox, 25 fro AS 2024-08-19 21:39:02,744 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.331e+01 2.511e+01 2.867e+01 4.334e+02, threshold=5.023e+01, percent-clipped=2.0 2024-08-19 21:39:10,612 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 20 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-19 21:39:12,106 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 9750, loss[loss=0.1039, beats_loss=0.0119, ecapa_loss=0.0001216, whisper_loss=0.09081, over 16510.00 frames. ], tot_loss[loss=0.09973, beats_loss=0.01051, ecapa_loss=0.0001423, whisper_loss=0.0878, over 3746630.19 frames. ], batch size: 63, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:39:32,173 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 21:39:34,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4543490.0, ans=0.125 2024-08-19 21:39:44,442 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=12.0 2024-08-19 21:39:55,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4543590.0, ans=0.0 2024-08-19 21:39:57,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4543590.0, ans=0.125 2024-08-19 21:40:21,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4543690.0, ans=0.09899494936611666 2024-08-19 21:40:31,002 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 18 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 21:40:32,999 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 24 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-19 21:40:40,092 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.40 vs. limit=22.5 2024-08-19 21:40:48,535 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 9800, loss[loss=0.08148, beats_loss=0.01234, ecapa_loss=0.0001417, whisper_loss=0.06773, over 22564.00 frames. ], tot_loss[loss=0.09967, beats_loss=0.01052, ecapa_loss=0.0001414, whisper_loss=0.08773, over 3761848.09 frames. ], batch size: 92, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:40:53,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4543890.0, ans=0.125 2024-08-19 21:40:55,930 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.07 vs. limit=12.0 2024-08-19 21:40:59,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4543890.0, ans=0.035 2024-08-19 21:40:59,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4543890.0, ans=0.0 2024-08-19 21:41:30,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4544090.0, ans=0.0 2024-08-19 21:41:37,990 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.17 vs. limit=22.5 2024-08-19 21:42:08,054 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.85 vs. limit=15.0 2024-08-19 21:42:16,137 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.232e+01 2.541e+01 2.711e+01 1.410e+02, threshold=5.082e+01, percent-clipped=1.0 2024-08-19 21:42:16,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4544290.0, ans=0.2 2024-08-19 21:42:26,648 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 9850, loss[loss=0.1059, beats_loss=0.01088, ecapa_loss=0.0001162, whisper_loss=0.09382, over 23068.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01052, ecapa_loss=0.0001414, whisper_loss=0.08857, over 3781236.80 frames. ], batch size: 88, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:42:39,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4544390.0, ans=0.125 2024-08-19 21:42:41,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4544390.0, ans=0.125 2024-08-19 21:42:53,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4544490.0, ans=0.95 2024-08-19 21:43:30,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4544690.0, ans=0.1 2024-08-19 21:43:57,254 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 9900, loss[loss=0.1136, beats_loss=0.01179, ecapa_loss=0.0001452, whisper_loss=0.1004, over 17816.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01063, ecapa_loss=0.0001398, whisper_loss=0.08857, over 3812378.28 frames. ], batch size: 72, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:43:59,765 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=15.0 2024-08-19 21:44:04,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4544890.0, ans=0.125 2024-08-19 21:44:07,915 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 17 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 21:44:08,492 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2024-08-19 21:44:24,863 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.63 vs. limit=15.0 2024-08-19 21:44:49,896 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 29 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-19 21:45:07,878 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 21:45:11,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4545290.0, ans=0.1 2024-08-19 21:45:17,916 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.712e+01 2.278e+01 2.511e+01 2.738e+01 3.827e+01, threshold=5.022e+01, percent-clipped=0.0 2024-08-19 21:45:18,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4545290.0, ans=0.2 2024-08-19 21:45:21,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4545290.0, ans=0.1 2024-08-19 21:45:25,779 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 9950, loss[loss=0.0865, beats_loss=0.01311, ecapa_loss=0.0001143, whisper_loss=0.07224, over 12355.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01066, ecapa_loss=0.000138, whisper_loss=0.08865, over 3785603.32 frames. ], batch size: 51, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:45:28,965 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 22 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-19 21:45:31,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4545390.0, ans=0.125 2024-08-19 21:45:42,003 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.56 vs. limit=15.0 2024-08-19 21:45:46,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4545490.0, ans=0.0 2024-08-19 21:45:57,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4545490.0, ans=0.05 2024-08-19 21:46:11,807 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 33 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-19 21:46:37,666 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 17 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-19 21:46:39,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4545790.0, ans=0.125 2024-08-19 21:46:39,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4545790.0, ans=0.0 2024-08-19 21:46:41,350 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 35 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-19 21:46:47,934 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.28 vs. limit=22.5 2024-08-19 21:46:56,125 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 10000, loss[loss=0.1199, beats_loss=0.01163, ecapa_loss=0.000143, whisper_loss=0.1068, over 20648.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01068, ecapa_loss=0.0001374, whisper_loss=0.08976, over 3799107.61 frames. ], batch size: 84, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:47:09,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4545890.0, ans=0.0 2024-08-19 21:47:17,892 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 17 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-19 21:47:18,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4545990.0, ans=0.125 2024-08-19 21:47:22,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4545990.0, ans=0.125 2024-08-19 21:47:28,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4545990.0, ans=0.125 2024-08-19 21:47:34,063 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.56 vs. limit=22.5 2024-08-19 21:47:40,686 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 24 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-19 21:48:17,681 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.220e+01 2.388e+01 2.607e+01 4.207e+01, threshold=4.776e+01, percent-clipped=0.0 2024-08-19 21:48:26,805 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 10050, loss[loss=0.1262, beats_loss=0.008148, ecapa_loss=0.0001237, whisper_loss=0.1168, over 23417.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01066, ecapa_loss=0.0001369, whisper_loss=0.09003, over 3793387.53 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:48:43,602 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 12 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 21:48:49,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4546490.0, ans=0.0 2024-08-19 21:49:18,741 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 21:49:30,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=4546690.0, ans=0.025 2024-08-19 21:49:34,259 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-19 21:49:36,714 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.34 vs. limit=22.5 2024-08-19 21:49:41,761 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 33 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-19 21:49:49,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4546790.0, ans=0.125 2024-08-19 21:49:51,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4546790.0, ans=0.125 2024-08-19 21:49:57,089 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 10100, loss[loss=0.08502, beats_loss=0.01095, ecapa_loss=0.0001397, whisper_loss=0.07268, over 12856.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01064, ecapa_loss=0.0001371, whisper_loss=0.09014, over 3852456.72 frames. ], batch size: 51, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:50:04,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4546890.0, ans=0.2 2024-08-19 21:50:43,138 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 35 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-19 21:50:48,678 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 12 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-19 21:51:18,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4547290.0, ans=0.0 2024-08-19 21:51:21,703 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.291e+01 2.540e+01 2.770e+01 3.592e+02, threshold=5.079e+01, percent-clipped=1.0 2024-08-19 21:51:27,039 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-19 21:51:30,717 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 10150, loss[loss=0.1139, beats_loss=0.008819, ecapa_loss=0.0001551, whisper_loss=0.1035, over 13237.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01054, ecapa_loss=0.0001395, whisper_loss=0.09076, over 3860123.64 frames. ], batch size: 53, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:51:42,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4547390.0, ans=0.125 2024-08-19 21:51:43,851 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 16 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-19 21:51:55,651 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-19 21:51:57,748 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 23 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-19 21:52:00,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4547490.0, ans=0.0 2024-08-19 21:52:05,258 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.39 vs. limit=10.0 2024-08-19 21:52:08,332 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 25 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 21:52:14,473 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.41 vs. limit=22.5 2024-08-19 21:52:35,024 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 25 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-19 21:52:36,625 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 31 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 21:53:00,000 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.72 vs. limit=15.0 2024-08-19 21:53:04,949 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 10200, loss[loss=0.08929, beats_loss=0.01249, ecapa_loss=0.000131, whisper_loss=0.07549, over 23065.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.0001411, whisper_loss=0.09057, over 3866363.90 frames. ], batch size: 94, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:53:22,093 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.26 vs. limit=10.0 2024-08-19 21:53:26,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4547990.0, ans=0.0 2024-08-19 21:53:45,276 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 13 from LS+wenet, 27 from Vox, 19 fro AS 2024-08-19 21:53:51,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4548090.0, ans=0.0 2024-08-19 21:53:55,023 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 10 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 21:53:58,721 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 18 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-19 21:54:09,170 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.918e+01 2024-08-19 21:54:20,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4548190.0, ans=0.0 2024-08-19 21:54:25,881 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-19 21:54:30,093 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 15 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-19 21:54:32,261 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 28 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-19 21:54:33,827 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.328e+01 2.553e+01 2.832e+01 3.795e+01, threshold=5.106e+01, percent-clipped=0.0 2024-08-19 21:54:34,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4548290.0, ans=0.2 2024-08-19 21:54:43,480 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 10250, loss[loss=0.1009, beats_loss=0.0116, ecapa_loss=0.0001326, whisper_loss=0.088, over 22121.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01045, ecapa_loss=0.0001416, whisper_loss=0.09105, over 3871902.76 frames. ], batch size: 92, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:54:51,023 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 16 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-19 21:55:17,726 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 21:55:49,512 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 21:56:00,314 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 27 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-19 21:56:00,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4548790.0, ans=0.1 2024-08-19 21:56:21,613 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 10300, loss[loss=0.09352, beats_loss=0.0108, ecapa_loss=0.0001369, whisper_loss=0.08135, over 22263.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01039, ecapa_loss=0.0001412, whisper_loss=0.09136, over 3875940.53 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:56:22,304 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2024-08-19 21:56:33,267 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 32 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 21:56:47,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4548990.0, ans=0.0 2024-08-19 21:57:05,878 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 21:57:16,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4549090.0, ans=0.2 2024-08-19 21:57:23,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4549190.0, ans=0.0 2024-08-19 21:57:27,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4549190.0, ans=0.1 2024-08-19 21:57:46,711 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 20 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-19 21:57:47,232 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.75 vs. limit=22.5 2024-08-19 21:57:50,421 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.783e+01 2.288e+01 2.512e+01 2.813e+01 4.612e+01, threshold=5.024e+01, percent-clipped=0.0 2024-08-19 21:57:59,992 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 10350, loss[loss=0.09894, beats_loss=0.009317, ecapa_loss=0.0001418, whisper_loss=0.08821, over 18725.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0104, ecapa_loss=0.0001427, whisper_loss=0.09079, over 3880043.89 frames. ], batch size: 76, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:58:04,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4549390.0, ans=0.125 2024-08-19 21:58:14,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4549390.0, ans=0.2 2024-08-19 21:58:18,707 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.98 vs. limit=8.0 2024-08-19 21:58:22,559 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=15.0 2024-08-19 21:58:23,978 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 21:58:31,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4549490.0, ans=0.0 2024-08-19 21:58:31,615 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.96 vs. limit=15.0 2024-08-19 21:59:26,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4549790.0, ans=0.0 2024-08-19 21:59:30,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4549790.0, ans=0.05 2024-08-19 21:59:32,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4549790.0, ans=0.04949747468305833 2024-08-19 21:59:34,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4549790.0, ans=0.2 2024-08-19 21:59:36,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4549790.0, ans=0.125 2024-08-19 21:59:39,690 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 10400, loss[loss=0.1162, beats_loss=0.009482, ecapa_loss=0.0001335, whisper_loss=0.1054, over 17932.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01036, ecapa_loss=0.0001425, whisper_loss=0.09103, over 3889031.81 frames. ], batch size: 70, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 21:59:40,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4549890.0, ans=0.1 2024-08-19 21:59:46,122 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.120e+05 2024-08-19 21:59:49,804 WARNING [optim.py:496] (0/4) Scaling gradients by 0.09229938685894012, model_norm_threshold=50.23906326293945 2024-08-19 21:59:49,958 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.352e+04, grad_sumsq=4.110e+06, orig_rms_sq=1.059e-02 2024-08-19 22:00:00,035 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 26 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-19 22:00:00,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4549990.0, ans=0.1 2024-08-19 22:00:03,982 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 19 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-19 22:00:17,830 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.18 vs. limit=15.0 2024-08-19 22:00:53,911 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 26 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 22:00:55,743 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-19 22:01:12,344 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 30 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-19 22:01:14,095 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.360e+01 2.658e+01 2.930e+01 5.443e+02, threshold=5.316e+01, percent-clipped=2.0 2024-08-19 22:01:18,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4550290.0, ans=0.125 2024-08-19 22:01:23,789 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 10450, loss[loss=0.07471, beats_loss=0.01265, ecapa_loss=0.0001471, whisper_loss=0.06059, over 20329.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01038, ecapa_loss=0.0001432, whisper_loss=0.0907, over 3862301.26 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:01:26,052 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 34 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 22:01:29,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.46 vs. limit=22.5 2024-08-19 22:01:37,006 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 21 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-19 22:02:02,653 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 23 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 22:02:10,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4550590.0, ans=0.0 2024-08-19 22:02:23,802 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 23 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-19 22:02:50,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4550790.0, ans=0.0 2024-08-19 22:03:09,019 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 10500, loss[loss=0.1026, beats_loss=0.009036, ecapa_loss=0.0001561, whisper_loss=0.092, over 23095.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01034, ecapa_loss=0.0001427, whisper_loss=0.09128, over 3854832.02 frames. ], batch size: 93, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:03:14,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4550890.0, ans=0.0 2024-08-19 22:04:03,836 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 29 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-19 22:04:13,242 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.60 vs. limit=15.0 2024-08-19 22:04:38,000 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 26 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-19 22:04:43,916 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.385e+01 2.732e+01 3.050e+01 1.536e+02, threshold=5.464e+01, percent-clipped=1.0 2024-08-19 22:04:46,263 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 12 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-19 22:04:55,216 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 10550, loss[loss=0.1241, beats_loss=0.00809, ecapa_loss=0.0001521, whisper_loss=0.1145, over 17336.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01033, ecapa_loss=0.0001424, whisper_loss=0.0912, over 3818411.44 frames. ], batch size: 65, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:05:42,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4551590.0, ans=0.09899494936611666 2024-08-19 22:05:42,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4551590.0, ans=0.125 2024-08-19 22:05:55,981 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=3.998e-02 2024-08-19 22:06:23,860 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=22.5 2024-08-19 22:06:41,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4551790.0, ans=0.0 2024-08-19 22:06:46,833 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 10600, loss[loss=0.06466, beats_loss=0.01315, ecapa_loss=0.0001842, whisper_loss=0.04967, over 17359.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01032, ecapa_loss=0.0001421, whisper_loss=0.09116, over 3802821.41 frames. ], batch size: 76, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:06:50,041 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2024-08-19 22:07:07,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4551990.0, ans=0.0 2024-08-19 22:07:28,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4552090.0, ans=10.0 2024-08-19 22:07:43,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4552090.0, ans=0.1 2024-08-19 22:07:51,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4552190.0, ans=0.0 2024-08-19 22:08:03,158 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 17 from LS+wenet, 13 from Vox, 46 fro AS 2024-08-19 22:08:03,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4552190.0, ans=0.125 2024-08-19 22:08:16,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4552290.0, ans=0.125 2024-08-19 22:08:21,878 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.619e+01 2.245e+01 2.495e+01 2.796e+01 7.063e+01, threshold=4.991e+01, percent-clipped=1.0 2024-08-19 22:08:24,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4552290.0, ans=0.2 2024-08-19 22:08:32,211 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 10650, loss[loss=0.1198, beats_loss=0.009496, ecapa_loss=0.0001468, whisper_loss=0.1088, over 21489.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01042, ecapa_loss=0.0001412, whisper_loss=0.08987, over 3812524.97 frames. ], batch size: 87, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:09:02,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4552490.0, ans=0.1 2024-08-19 22:09:04,864 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.39 vs. limit=15.0 2024-08-19 22:09:04,937 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.42 vs. limit=15.0 2024-08-19 22:09:37,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4552690.0, ans=0.125 2024-08-19 22:10:00,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4552790.0, ans=0.125 2024-08-19 22:10:09,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4552790.0, ans=0.1 2024-08-19 22:10:11,007 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 18 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-19 22:10:15,045 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 10700, loss[loss=0.1095, beats_loss=0.01129, ecapa_loss=0.0001472, whisper_loss=0.09669, over 13146.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0105, ecapa_loss=0.0001405, whisper_loss=0.08997, over 3816110.81 frames. ], batch size: 53, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:11:04,569 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 16 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-19 22:11:19,603 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 15 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-19 22:11:47,152 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.77 vs. limit=10.0 2024-08-19 22:11:54,021 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.715e+01 2.310e+01 2.569e+01 2.819e+01 4.934e+01, threshold=5.139e+01, percent-clipped=0.0 2024-08-19 22:11:57,001 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 25 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 22:12:01,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4553290.0, ans=0.1 2024-08-19 22:12:03,572 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 10750, loss[loss=0.1146, beats_loss=0.008711, ecapa_loss=0.0001675, whisper_loss=0.1042, over 15149.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01046, ecapa_loss=0.0001417, whisper_loss=0.08976, over 3810985.37 frames. ], batch size: 63, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:12:04,080 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 29 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-19 22:12:06,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4553390.0, ans=0.0 2024-08-19 22:12:06,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4553390.0, ans=0.0 2024-08-19 22:12:18,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4553390.0, ans=0.2 2024-08-19 22:12:23,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4553490.0, ans=0.1 2024-08-19 22:12:30,302 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.72 vs. limit=15.0 2024-08-19 22:13:21,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4553690.0, ans=0.07 2024-08-19 22:13:41,507 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 26 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-19 22:13:44,352 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 10800, loss[loss=0.1204, beats_loss=0.009094, ecapa_loss=0.0001379, whisper_loss=0.1099, over 23069.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01056, ecapa_loss=0.0001414, whisper_loss=0.08876, over 3817757.07 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:13:46,030 INFO [train_multi_KD3.py:845] (0/4) A total of 96 cuts. 24 from LS+wenet, 32 from Vox, 40 fro AS 2024-08-19 22:13:55,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4553890.0, ans=0.0 2024-08-19 22:13:59,366 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.00 vs. limit=10.0 2024-08-19 22:14:01,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4553890.0, ans=0.125 2024-08-19 22:14:09,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4553990.0, ans=0.1 2024-08-19 22:14:30,379 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.34 vs. limit=15.0 2024-08-19 22:14:55,023 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 27 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 22:15:16,914 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.322e+01 2.506e+01 2.895e+01 8.208e+01, threshold=5.011e+01, percent-clipped=1.0 2024-08-19 22:15:19,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4554290.0, ans=0.0 2024-08-19 22:15:26,479 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 10850, loss[loss=0.09858, beats_loss=0.01076, ecapa_loss=0.0001406, whisper_loss=0.08641, over 19183.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01055, ecapa_loss=0.0001408, whisper_loss=0.08898, over 3811946.88 frames. ], batch size: 77, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:15:27,375 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 23 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 22:15:28,905 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 9 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 22:15:33,924 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.41 vs. limit=15.0 2024-08-19 22:16:05,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4554590.0, ans=0.0 2024-08-19 22:16:26,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4554690.0, ans=0.125 2024-08-19 22:16:39,052 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.07 vs. limit=15.0 2024-08-19 22:16:51,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4554790.0, ans=0.125 2024-08-19 22:17:11,308 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 10900, loss[loss=0.09762, beats_loss=0.009455, ecapa_loss=0.0001676, whisper_loss=0.08649, over 16382.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01047, ecapa_loss=0.0001408, whisper_loss=0.08938, over 3797595.87 frames. ], batch size: 66, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:17:22,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4554890.0, ans=0.025 2024-08-19 22:17:45,023 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 14 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 22:18:15,255 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.81 vs. limit=8.0 2024-08-19 22:18:26,225 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 22:18:51,362 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.334e+01 2.577e+01 2.862e+01 5.392e+01, threshold=5.154e+01, percent-clipped=2.0 2024-08-19 22:19:03,227 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 10950, loss[loss=0.09786, beats_loss=0.01112, ecapa_loss=0.000153, whisper_loss=0.08521, over 21768.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01046, ecapa_loss=0.0001404, whisper_loss=0.08996, over 3820598.60 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:19:03,403 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 34 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-19 22:19:10,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4555390.0, ans=0.125 2024-08-19 22:20:22,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4555690.0, ans=0.125 2024-08-19 22:20:34,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4555790.0, ans=0.125 2024-08-19 22:20:41,336 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.87 vs. limit=22.5 2024-08-19 22:20:43,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4555790.0, ans=0.1 2024-08-19 22:20:57,494 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 11000, loss[loss=0.1254, beats_loss=0.009723, ecapa_loss=0.0001557, whisper_loss=0.1142, over 14983.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.0001405, whisper_loss=0.09032, over 3818123.77 frames. ], batch size: 60, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:21:02,131 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 22:21:36,370 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 32 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-19 22:21:47,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4556090.0, ans=0.1 2024-08-19 22:22:04,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4556190.0, ans=0.125 2024-08-19 22:22:40,253 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.290e+01 2.533e+01 2.932e+01 4.213e+01, threshold=5.066e+01, percent-clipped=0.0 2024-08-19 22:22:51,787 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 11050, loss[loss=0.1139, beats_loss=0.009721, ecapa_loss=0.0001708, whisper_loss=0.1024, over 22065.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01046, ecapa_loss=0.000141, whisper_loss=0.09016, over 3823742.60 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:23:01,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4556390.0, ans=0.125 2024-08-19 22:23:26,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4556490.0, ans=0.2 2024-08-19 22:24:01,763 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 22:24:20,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4556690.0, ans=0.125 2024-08-19 22:24:20,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4556690.0, ans=0.125 2024-08-19 22:24:45,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4556890.0, ans=0.1 2024-08-19 22:24:45,934 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 11100, loss[loss=0.1033, beats_loss=0.01125, ecapa_loss=0.0001167, whisper_loss=0.09084, over 18183.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01045, ecapa_loss=0.0001409, whisper_loss=0.09093, over 3856007.17 frames. ], batch size: 70, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:24:53,847 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 27 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-19 22:25:10,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4556990.0, ans=0.125 2024-08-19 22:25:14,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4556990.0, ans=0.1 2024-08-19 22:25:24,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4556990.0, ans=0.0 2024-08-19 22:25:26,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4556990.0, ans=0.2 2024-08-19 22:25:28,406 INFO [train_multi_KD3.py:845] (0/4) A total of 96 cuts. 32 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-19 22:25:35,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4557090.0, ans=0.0 2024-08-19 22:25:44,512 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.79 vs. limit=15.0 2024-08-19 22:25:45,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=4557090.0, ans=0.025 2024-08-19 22:25:58,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4557190.0, ans=0.125 2024-08-19 22:26:00,926 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 16 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-19 22:26:03,321 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 18 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-19 22:26:04,210 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.94 vs. limit=15.0 2024-08-19 22:26:22,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4557290.0, ans=0.125 2024-08-19 22:26:30,298 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.214e+01 2.398e+01 2.634e+01 3.893e+01, threshold=4.795e+01, percent-clipped=0.0 2024-08-19 22:26:41,236 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 11150, loss[loss=0.07116, beats_loss=0.01233, ecapa_loss=0.0001382, whisper_loss=0.05745, over 16422.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01046, ecapa_loss=0.0001401, whisper_loss=0.09075, over 3819997.91 frames. ], batch size: 67, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:26:47,198 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 17 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 22:26:54,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4557390.0, ans=0.0 2024-08-19 22:27:02,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4557490.0, ans=0.2 2024-08-19 22:27:12,017 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 19 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-19 22:27:25,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4557590.0, ans=0.1 2024-08-19 22:27:29,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4557590.0, ans=0.2 2024-08-19 22:27:46,850 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 19 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-19 22:27:59,957 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 22:28:13,966 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 23 from LS+wenet, 11 from Vox, 19 fro AS 2024-08-19 22:28:16,371 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 32 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-19 22:28:18,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4557790.0, ans=0.125 2024-08-19 22:28:39,597 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 11200, loss[loss=0.1018, beats_loss=0.009672, ecapa_loss=0.0001464, whisper_loss=0.0907, over 19496.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01045, ecapa_loss=0.0001404, whisper_loss=0.09063, over 3815936.73 frames. ], batch size: 78, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:28:45,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4557890.0, ans=0.1 2024-08-19 22:29:10,321 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 21 from LS+wenet, 12 from Vox, 19 fro AS 2024-08-19 22:29:13,617 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.86 vs. limit=10.0 2024-08-19 22:29:25,743 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-19 22:29:35,777 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 16 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-19 22:29:43,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4558090.0, ans=0.0 2024-08-19 22:30:00,110 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.14 vs. limit=22.5 2024-08-19 22:30:14,333 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 18 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-19 22:30:26,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4558290.0, ans=0.04949747468305833 2024-08-19 22:30:34,227 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.555e+01 2.356e+01 2.585e+01 2.931e+01 9.399e+01, threshold=5.169e+01, percent-clipped=1.0 2024-08-19 22:30:34,506 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 32 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 22:30:45,980 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.14 vs. limit=15.0 2024-08-19 22:30:47,239 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 11250, loss[loss=0.1142, beats_loss=0.009655, ecapa_loss=0.0001355, whisper_loss=0.1032, over 14814.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01046, ecapa_loss=0.0001408, whisper_loss=0.09067, over 3815904.01 frames. ], batch size: 56, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:31:14,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4558490.0, ans=0.2 2024-08-19 22:31:21,925 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 27 from LS+wenet, 7 from Vox, 42 fro AS 2024-08-19 22:31:28,787 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 28 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 22:31:48,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4558590.0, ans=0.07 2024-08-19 22:32:45,370 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.59 vs. limit=12.0 2024-08-19 22:32:46,947 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 11300, loss[loss=0.1058, beats_loss=0.01138, ecapa_loss=0.0001298, whisper_loss=0.09317, over 18066.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01038, ecapa_loss=0.0001413, whisper_loss=0.09068, over 3791422.55 frames. ], batch size: 72, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:33:05,735 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 24 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 22:34:21,976 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 25 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-19 22:34:27,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4559290.0, ans=0.1 2024-08-19 22:34:32,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4559290.0, ans=0.125 2024-08-19 22:34:35,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4559290.0, ans=0.125 2024-08-19 22:34:36,631 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.198e+01 2.361e+01 2.678e+01 4.685e+01, threshold=4.723e+01, percent-clipped=0.0 2024-08-19 22:34:48,383 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 11350, loss[loss=0.1111, beats_loss=0.01017, ecapa_loss=0.0001264, whisper_loss=0.09967, over 18107.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01034, ecapa_loss=0.0001409, whisper_loss=0.09103, over 3808318.51 frames. ], batch size: 69, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:35:23,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4559490.0, ans=0.125 2024-08-19 22:36:01,430 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 22 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-19 22:36:51,798 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 11400, loss[loss=0.1087, beats_loss=0.01142, ecapa_loss=0.0001395, whisper_loss=0.09585, over 22487.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01024, ecapa_loss=0.0001406, whisper_loss=0.09184, over 3831977.20 frames. ], batch size: 91, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:36:58,613 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 21 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-19 22:36:59,281 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.20 vs. limit=15.0 2024-08-19 22:37:06,576 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 24 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-19 22:37:17,058 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-456000.pt 2024-08-19 22:37:53,080 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 32 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 22:38:11,664 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.46 vs. limit=22.5 2024-08-19 22:38:22,500 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 25 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-19 22:38:44,544 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.308e+01 2.574e+01 2.916e+01 2.255e+02, threshold=5.148e+01, percent-clipped=1.0 2024-08-19 22:38:56,445 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 11450, loss[loss=0.08705, beats_loss=0.01102, ecapa_loss=0.0001466, whisper_loss=0.07456, over 20077.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01033, ecapa_loss=0.000141, whisper_loss=0.09117, over 3857478.83 frames. ], batch size: 83, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:39:23,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4560490.0, ans=0.125 2024-08-19 22:39:39,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4560490.0, ans=0.0 2024-08-19 22:40:02,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4560590.0, ans=0.125 2024-08-19 22:40:04,622 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-19 22:40:20,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4560690.0, ans=0.125 2024-08-19 22:40:27,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4560690.0, ans=0.125 2024-08-19 22:40:30,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4560790.0, ans=0.1 2024-08-19 22:40:33,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4560790.0, ans=0.0 2024-08-19 22:40:45,677 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 23 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 22:40:45,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4560790.0, ans=0.125 2024-08-19 22:40:55,979 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 11500, loss[loss=0.1076, beats_loss=0.009502, ecapa_loss=0.0001459, whisper_loss=0.09661, over 15698.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01033, ecapa_loss=0.0001403, whisper_loss=0.09081, over 3829570.09 frames. ], batch size: 59, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:41:07,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4560890.0, ans=0.1 2024-08-19 22:41:29,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4560990.0, ans=0.0 2024-08-19 22:41:45,857 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.05 vs. limit=15.0 2024-08-19 22:41:47,053 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.57 vs. limit=15.0 2024-08-19 22:41:50,827 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2024-08-19 22:42:39,012 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 33 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-19 22:42:39,955 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.243e+01 2.481e+01 2.784e+01 2.057e+02, threshold=4.962e+01, percent-clipped=3.0 2024-08-19 22:42:41,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4561290.0, ans=0.0 2024-08-19 22:42:50,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4561390.0, ans=0.125 2024-08-19 22:42:51,360 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 11550, loss[loss=0.09244, beats_loss=0.01181, ecapa_loss=0.0001342, whisper_loss=0.07928, over 21879.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01035, ecapa_loss=0.0001401, whisper_loss=0.09079, over 3842647.27 frames. ], batch size: 90, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:42:55,639 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 26 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-19 22:44:02,907 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 15 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-19 22:44:43,862 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 11600, loss[loss=0.09476, beats_loss=0.011, ecapa_loss=0.0001248, whisper_loss=0.08251, over 22112.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01038, ecapa_loss=0.0001398, whisper_loss=0.09044, over 3868802.03 frames. ], batch size: 87, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:44:47,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4561890.0, ans=0.1 2024-08-19 22:45:00,798 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 27 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-19 22:45:15,782 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 19 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 22:45:22,730 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 17 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-19 22:45:42,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4562190.0, ans=0.0 2024-08-19 22:45:47,573 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.53 vs. limit=15.0 2024-08-19 22:46:00,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4562290.0, ans=0.125 2024-08-19 22:46:02,241 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.76 vs. limit=10.0 2024-08-19 22:46:04,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4562290.0, ans=0.07 2024-08-19 22:46:10,268 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.615e+01 2.277e+01 2.484e+01 2.833e+01 4.332e+01, threshold=4.968e+01, percent-clipped=0.0 2024-08-19 22:46:10,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4562290.0, ans=0.1 2024-08-19 22:46:19,760 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 11650, loss[loss=0.1155, beats_loss=0.0123, ecapa_loss=0.0001376, whisper_loss=0.1019, over 21741.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01038, ecapa_loss=0.0001393, whisper_loss=0.09067, over 3817084.55 frames. ], batch size: 84, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:46:29,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=4562390.0, ans=0.2 2024-08-19 22:46:32,761 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 19 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-19 22:46:34,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4562390.0, ans=0.035 2024-08-19 22:46:43,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=4562490.0, ans=15.0 2024-08-19 22:47:15,302 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 20 from LS+wenet, 18 from Vox, 14 fro AS 2024-08-19 22:47:21,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4562690.0, ans=0.125 2024-08-19 22:47:32,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4562690.0, ans=0.1 2024-08-19 22:47:50,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4562790.0, ans=0.125 2024-08-19 22:47:54,512 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.05 vs. limit=22.5 2024-08-19 22:47:57,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4562790.0, ans=0.2 2024-08-19 22:48:01,565 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 11700, loss[loss=0.1037, beats_loss=0.01142, ecapa_loss=0.0001148, whisper_loss=0.09108, over 14440.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0104, ecapa_loss=0.0001395, whisper_loss=0.0901, over 3814517.40 frames. ], batch size: 55, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:48:06,245 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=22.5 2024-08-19 22:48:17,247 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 22:48:19,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4562890.0, ans=0.0 2024-08-19 22:48:47,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4563090.0, ans=0.0 2024-08-19 22:48:49,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4563090.0, ans=0.0 2024-08-19 22:48:50,377 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 26 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-19 22:49:06,246 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 17 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-19 22:49:10,423 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 22 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-19 22:49:10,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4563190.0, ans=0.04949747468305833 2024-08-19 22:49:10,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4563190.0, ans=0.0 2024-08-19 22:49:12,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4563190.0, ans=0.125 2024-08-19 22:49:19,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=4563190.0, ans=15.0 2024-08-19 22:49:29,493 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 30 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-19 22:49:34,908 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.355e+01 2.582e+01 2.983e+01 7.922e+01, threshold=5.164e+01, percent-clipped=1.0 2024-08-19 22:49:47,377 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 11750, loss[loss=0.1297, beats_loss=0.01035, ecapa_loss=0.0001039, whisper_loss=0.1183, over 23964.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01037, ecapa_loss=0.0001388, whisper_loss=0.08983, over 3790168.94 frames. ], batch size: 89, lr: 1.96e-03, grad_scale: 1.152921504606847e+18 2024-08-19 22:49:58,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4563390.0, ans=0.125 2024-08-19 22:50:22,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4563490.0, ans=0.125 2024-08-19 22:50:26,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=4563490.0, ans=0.1 2024-08-19 22:51:01,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4563690.0, ans=0.1 2024-08-19 22:51:21,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4563790.0, ans=0.2 2024-08-19 22:51:39,030 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 34 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-19 22:51:42,167 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 11800, loss[loss=0.05319, beats_loss=0.01498, ecapa_loss=7.647e-05, whisper_loss=0.03744, over 16016.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01035, ecapa_loss=0.0001396, whisper_loss=0.09034, over 3800948.98 frames. ], batch size: 62, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:51:43,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4563890.0, ans=0.2 2024-08-19 22:51:52,264 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 32 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-19 22:52:02,900 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 27 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-19 22:52:05,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4563990.0, ans=0.1 2024-08-19 22:52:11,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4563990.0, ans=0.125 2024-08-19 22:52:56,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4564190.0, ans=0.1 2024-08-19 22:52:59,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4564190.0, ans=0.1 2024-08-19 22:53:05,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4564190.0, ans=0.125 2024-08-19 22:53:24,440 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.230e+01 2.402e+01 2.738e+01 3.572e+01, threshold=4.803e+01, percent-clipped=0.0 2024-08-19 22:53:33,760 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 11850, loss[loss=0.1195, beats_loss=0.0097, ecapa_loss=0.0001458, whisper_loss=0.1083, over 23125.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01037, ecapa_loss=0.0001394, whisper_loss=0.09014, over 3839671.09 frames. ], batch size: 90, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:54:11,724 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 17 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-19 22:54:32,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4564590.0, ans=0.125 2024-08-19 22:54:41,864 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 22:54:44,182 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 20 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-19 22:55:11,259 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 24 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-19 22:55:15,437 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 25 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-19 22:55:24,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4564890.0, ans=0.0 2024-08-19 22:55:25,496 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 11900, loss[loss=0.1158, beats_loss=0.009962, ecapa_loss=0.0001689, whisper_loss=0.1042, over 22627.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01039, ecapa_loss=0.0001404, whisper_loss=0.08994, over 3830180.12 frames. ], batch size: 89, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:55:30,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4564890.0, ans=0.125 2024-08-19 22:55:37,642 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 28 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 22:55:37,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4564890.0, ans=0.2 2024-08-19 22:55:40,116 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.50 vs. limit=22.5 2024-08-19 22:56:46,200 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.27 vs. limit=22.5 2024-08-19 22:56:50,608 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2024-08-19 22:57:06,952 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.651e+01 2.317e+01 2.609e+01 2.861e+01 6.356e+01, threshold=5.219e+01, percent-clipped=1.0 2024-08-19 22:57:15,622 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 11950, loss[loss=0.1048, beats_loss=0.0082, ecapa_loss=0.0001397, whisper_loss=0.09525, over 21390.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01048, ecapa_loss=0.0001396, whisper_loss=0.08994, over 3843985.78 frames. ], batch size: 83, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:57:18,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4565390.0, ans=0.125 2024-08-19 22:57:18,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4565390.0, ans=0.0 2024-08-19 22:57:19,324 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=15.0 2024-08-19 22:57:41,872 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.53 vs. limit=15.0 2024-08-19 22:57:53,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4565490.0, ans=0.2 2024-08-19 22:58:08,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4565590.0, ans=0.0 2024-08-19 22:58:58,904 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 12000, loss[loss=0.1141, beats_loss=0.009496, ecapa_loss=0.0001321, whisper_loss=0.1033, over 13483.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01047, ecapa_loss=0.00014, whisper_loss=0.09054, over 3862666.42 frames. ], batch size: 54, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 22:58:58,906 INFO [train_multi_KD3.py:1140] (0/4) Computing validation loss 2024-08-19 22:59:35,943 INFO [train_multi_KD3.py:1150] (0/4) Epoch 31, validation on ASR_libri: loss=0.2535, beats_loss=0, ecapa_loss=0.0005134, whisper_loss=0.2483, over 931116.00 frames. 2024-08-19 23:00:01,593 INFO [train_multi_KD3.py:1150] (0/4) Epoch 31, validation on SV_voxceleb1: loss=0.003987, beats_loss=0, ecapa_loss=0.0003987, whisper_loss=0, over 944235.00 frames. 2024-08-19 23:00:14,549 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.9953, 2.8504, 3.3435, 3.0610], device='cuda:0') 2024-08-19 23:01:39,465 INFO [train_multi_KD3.py:1150] (0/4) Epoch 31, validation on AT_audioset: loss=0.02302, beats_loss=0.02302, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 23:01:39,470 INFO [train_multi_KD3.py:1156] (0/4) Maximum memory allocated so far is 32775MB 2024-08-19 23:01:41,202 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 14 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-19 23:01:49,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4565890.0, ans=0.2 2024-08-19 23:02:06,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4565990.0, ans=0.125 2024-08-19 23:02:06,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4565990.0, ans=0.1 2024-08-19 23:02:08,843 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 25 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-19 23:02:11,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4565990.0, ans=0.125 2024-08-19 23:02:29,801 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.22 vs. limit=10.0 2024-08-19 23:02:47,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=4566190.0, ans=10.0 2024-08-19 23:03:21,837 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.392e+01 2.640e+01 2.904e+01 4.083e+01, threshold=5.280e+01, percent-clipped=0.0 2024-08-19 23:03:31,862 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 12050, loss[loss=0.1016, beats_loss=0.01025, ecapa_loss=0.0001364, whisper_loss=0.09002, over 18632.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01048, ecapa_loss=0.0001396, whisper_loss=0.09024, over 3847022.91 frames. ], batch size: 73, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:03:37,584 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 26 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-19 23:03:40,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4566390.0, ans=0.125 2024-08-19 23:03:40,852 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.52 vs. limit=12.0 2024-08-19 23:03:44,045 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 23:03:55,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4566490.0, ans=0.125 2024-08-19 23:04:10,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4566490.0, ans=0.125 2024-08-19 23:04:12,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4566490.0, ans=0.125 2024-08-19 23:04:16,993 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 10 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-19 23:05:04,270 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 23:05:11,022 WARNING [optim.py:496] (0/4) Scaling gradients by 0.04405633732676506, model_norm_threshold=52.79664611816406 2024-08-19 23:05:11,177 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.250e+05, grad_sumsq=2.117e+07, orig_rms_sq=1.063e-02 2024-08-19 23:05:19,534 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 12100, loss[loss=0.0733, beats_loss=0.009603, ecapa_loss=0.0001598, whisper_loss=0.0621, over 15200.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01053, ecapa_loss=0.0001394, whisper_loss=0.08977, over 3854756.13 frames. ], batch size: 62, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:05:49,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2024-08-19 23:05:53,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4566990.0, ans=0.125 2024-08-19 23:06:07,780 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.29 vs. limit=10.0 2024-08-19 23:06:10,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4567090.0, ans=0.125 2024-08-19 23:06:12,447 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 19 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-19 23:06:12,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4567090.0, ans=0.2 2024-08-19 23:06:27,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4567190.0, ans=0.0 2024-08-19 23:06:38,610 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 24 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-19 23:06:53,375 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.68 vs. limit=6.0 2024-08-19 23:06:57,548 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.332e+01 2.618e+01 3.087e+01 1.198e+03, threshold=5.236e+01, percent-clipped=2.0 2024-08-19 23:07:06,256 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 12150, loss[loss=0.1052, beats_loss=0.01009, ecapa_loss=0.0001561, whisper_loss=0.09354, over 15136.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01041, ecapa_loss=0.0001402, whisper_loss=0.09047, over 3829438.22 frames. ], batch size: 63, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:07:07,163 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 23:07:31,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4567490.0, ans=0.0 2024-08-19 23:07:53,154 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 16 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-19 23:07:57,241 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 23:08:01,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4567590.0, ans=0.09899494936611666 2024-08-19 23:08:26,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4567690.0, ans=0.2 2024-08-19 23:08:26,711 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.18 vs. limit=12.0 2024-08-19 23:08:33,211 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 18 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-19 23:08:45,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4567790.0, ans=0.0 2024-08-19 23:08:49,857 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 12200, loss[loss=0.1104, beats_loss=0.01024, ecapa_loss=0.0001505, whisper_loss=0.09866, over 16287.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01039, ecapa_loss=0.0001401, whisper_loss=0.09016, over 3810105.64 frames. ], batch size: 64, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:08:55,349 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 18 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-19 23:09:01,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4567890.0, ans=0.2 2024-08-19 23:09:21,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4567990.0, ans=0.1 2024-08-19 23:10:18,678 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.296e+01 2.498e+01 2.784e+01 3.860e+01, threshold=4.997e+01, percent-clipped=0.0 2024-08-19 23:10:26,305 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.37 vs. limit=15.0 2024-08-19 23:10:26,696 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 12250, loss[loss=0.08551, beats_loss=0.009944, ecapa_loss=0.0001068, whisper_loss=0.07449, over 15836.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01045, ecapa_loss=0.0001396, whisper_loss=0.08944, over 3770716.67 frames. ], batch size: 59, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:10:28,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4568390.0, ans=0.0 2024-08-19 23:10:50,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4568490.0, ans=0.125 2024-08-19 23:11:14,145 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 33 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 23:11:23,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4568590.0, ans=0.125 2024-08-19 23:11:40,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4568690.0, ans=0.5 2024-08-19 23:11:51,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4568790.0, ans=0.1 2024-08-19 23:12:03,946 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 12300, loss[loss=0.09829, beats_loss=0.01027, ecapa_loss=0.0001307, whisper_loss=0.08672, over 16592.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01047, ecapa_loss=0.0001395, whisper_loss=0.08924, over 3791332.66 frames. ], batch size: 64, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:12:15,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4568890.0, ans=0.2 2024-08-19 23:12:17,002 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 17 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-19 23:12:18,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4568890.0, ans=0.125 2024-08-19 23:12:30,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4568990.0, ans=0.125 2024-08-19 23:12:38,026 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 23:13:34,109 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.176e+01 2.435e+01 2.712e+01 4.279e+01, threshold=4.869e+01, percent-clipped=0.0 2024-08-19 23:13:42,437 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 12350, loss[loss=0.1011, beats_loss=0.009298, ecapa_loss=0.0001706, whisper_loss=0.09011, over 13357.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01041, ecapa_loss=0.0001394, whisper_loss=0.09048, over 3804979.04 frames. ], batch size: 57, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:13:59,976 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=15.0 2024-08-19 23:14:28,593 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 23:15:01,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4569690.0, ans=0.0 2024-08-19 23:15:05,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4569790.0, ans=0.125 2024-08-19 23:15:14,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4569790.0, ans=0.2 2024-08-19 23:15:24,711 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 12400, loss[loss=0.08882, beats_loss=0.01069, ecapa_loss=0.0001321, whisper_loss=0.07682, over 22637.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01036, ecapa_loss=0.0001399, whisper_loss=0.09062, over 3810114.25 frames. ], batch size: 91, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:15:58,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4569990.0, ans=0.0 2024-08-19 23:16:13,547 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 13 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-19 23:16:22,120 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2024-08-19 23:16:33,523 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.95 vs. limit=15.0 2024-08-19 23:16:38,062 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 17 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-19 23:16:45,769 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 33 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-19 23:16:57,899 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 26 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-19 23:16:58,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4570290.0, ans=0.0 2024-08-19 23:17:00,847 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.374e+01 2.637e+01 2.909e+01 4.258e+01, threshold=5.273e+01, percent-clipped=0.0 2024-08-19 23:17:06,303 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-19 23:17:09,504 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 12450, loss[loss=0.1013, beats_loss=0.01223, ecapa_loss=0.0001281, whisper_loss=0.08782, over 19961.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01041, ecapa_loss=0.0001401, whisper_loss=0.09009, over 3829000.16 frames. ], batch size: 80, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:17:47,870 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 14 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 23:17:56,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4570590.0, ans=0.125 2024-08-19 23:18:02,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4570590.0, ans=0.125 2024-08-19 23:18:06,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4570590.0, ans=0.1 2024-08-19 23:18:51,833 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 12500, loss[loss=0.07821, beats_loss=0.01215, ecapa_loss=0.0001088, whisper_loss=0.06497, over 13793.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01038, ecapa_loss=0.0001406, whisper_loss=0.09007, over 3818422.09 frames. ], batch size: 53, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:19:00,818 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 23:19:11,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4570990.0, ans=0.0 2024-08-19 23:19:59,950 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-19 23:20:19,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4571190.0, ans=0.0 2024-08-19 23:20:20,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4571190.0, ans=0.0 2024-08-19 23:20:37,673 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.234e+01 2.500e+01 2.814e+01 4.349e+01, threshold=5.000e+01, percent-clipped=0.0 2024-08-19 23:20:47,941 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 12550, loss[loss=0.08879, beats_loss=0.0107, ecapa_loss=0.0001224, whisper_loss=0.07687, over 16063.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001402, whisper_loss=0.09035, over 3831215.46 frames. ], batch size: 64, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:21:38,043 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.76 vs. limit=22.5 2024-08-19 23:22:13,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=4571690.0, ans=15.0 2024-08-19 23:22:22,628 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 26 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-19 23:22:29,015 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.58 vs. limit=15.0 2024-08-19 23:22:37,394 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 12600, loss[loss=0.08097, beats_loss=0.01143, ecapa_loss=0.0001509, whisper_loss=0.06802, over 21000.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.0001403, whisper_loss=0.0904, over 3866681.34 frames. ], batch size: 90, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:23:04,356 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.13 vs. limit=10.0 2024-08-19 23:23:11,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4571990.0, ans=0.0 2024-08-19 23:23:26,590 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.16 vs. limit=12.0 2024-08-19 23:23:38,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4572090.0, ans=0.125 2024-08-19 23:23:57,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4572190.0, ans=0.2 2024-08-19 23:24:00,059 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.85 vs. limit=15.0 2024-08-19 23:24:32,016 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.292e+01 2.504e+01 2.662e+01 4.267e+01, threshold=5.008e+01, percent-clipped=0.0 2024-08-19 23:24:43,482 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 12650, loss[loss=0.1005, beats_loss=0.009594, ecapa_loss=0.0001487, whisper_loss=0.08938, over 22297.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01043, ecapa_loss=0.00014, whisper_loss=0.09004, over 3860366.23 frames. ], batch size: 90, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:24:52,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=4572390.0, ans=10.0 2024-08-19 23:25:14,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4572490.0, ans=0.0 2024-08-19 23:25:28,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4572490.0, ans=0.1 2024-08-19 23:25:35,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=15.0 2024-08-19 23:25:45,369 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 32 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 23:25:55,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4572690.0, ans=0.125 2024-08-19 23:26:15,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4572690.0, ans=0.125 2024-08-19 23:26:19,426 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 20 from LS+wenet, 31 from Vox, 14 fro AS 2024-08-19 23:26:39,133 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 12700, loss[loss=0.1048, beats_loss=0.008497, ecapa_loss=0.0001146, whisper_loss=0.09518, over 15436.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01045, ecapa_loss=0.0001402, whisper_loss=0.08978, over 3873426.94 frames. ], batch size: 55, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:26:42,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4572890.0, ans=0.125 2024-08-19 23:27:03,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4572990.0, ans=0.125 2024-08-19 23:27:14,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4572990.0, ans=0.0 2024-08-19 23:27:32,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4573090.0, ans=0.2 2024-08-19 23:27:45,864 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 17 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 23:27:50,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4573190.0, ans=0.125 2024-08-19 23:27:59,962 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 23:28:13,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4573290.0, ans=0.125 2024-08-19 23:28:25,363 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.359e+01 2.539e+01 2.793e+01 4.602e+02, threshold=5.078e+01, percent-clipped=1.0 2024-08-19 23:28:34,455 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 12750, loss[loss=0.09487, beats_loss=0.01264, ecapa_loss=0.0001511, whisper_loss=0.08072, over 14994.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01051, ecapa_loss=0.0001401, whisper_loss=0.0894, over 3877131.34 frames. ], batch size: 64, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:28:56,927 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 16 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-19 23:29:29,578 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 14 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-19 23:29:29,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4573590.0, ans=0.0 2024-08-19 23:29:35,754 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.90 vs. limit=22.5 2024-08-19 23:29:40,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4573590.0, ans=0.025 2024-08-19 23:30:07,465 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.15 vs. limit=15.0 2024-08-19 23:30:10,947 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 22 from LS+wenet, 12 from Vox, 41 fro AS 2024-08-19 23:30:34,027 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 12800, loss[loss=0.112, beats_loss=0.01002, ecapa_loss=0.0001193, whisper_loss=0.1008, over 16874.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01055, ecapa_loss=0.0001388, whisper_loss=0.08921, over 3830739.14 frames. ], batch size: 63, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:30:50,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4573890.0, ans=0.1 2024-08-19 23:31:38,333 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2024-08-19 23:31:47,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4574190.0, ans=0.125 2024-08-19 23:31:56,000 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2024-08-19 23:32:01,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4574190.0, ans=0.0 2024-08-19 23:32:15,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=4574290.0, ans=6.0 2024-08-19 23:32:27,805 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.310e+01 2.474e+01 2.739e+01 4.049e+01, threshold=4.949e+01, percent-clipped=0.0 2024-08-19 23:32:38,880 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 12850, loss[loss=0.1033, beats_loss=0.01097, ecapa_loss=0.0001219, whisper_loss=0.09107, over 21672.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0106, ecapa_loss=0.0001388, whisper_loss=0.08909, over 3844276.26 frames. ], batch size: 81, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:32:54,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4574390.0, ans=0.1 2024-08-19 23:33:00,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=4574390.0, ans=0.05 2024-08-19 23:33:01,871 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 25 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-19 23:33:57,222 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.44 vs. limit=15.0 2024-08-19 23:34:21,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4574790.0, ans=0.125 2024-08-19 23:34:21,334 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.78 vs. limit=22.5 2024-08-19 23:34:29,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4574790.0, ans=0.05 2024-08-19 23:34:43,116 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 12900, loss[loss=0.09825, beats_loss=0.00999, ecapa_loss=0.0001624, whisper_loss=0.08664, over 19597.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01053, ecapa_loss=0.0001391, whisper_loss=0.08993, over 3840297.34 frames. ], batch size: 79, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:35:29,219 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.99 vs. limit=15.0 2024-08-19 23:35:31,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4575090.0, ans=0.125 2024-08-19 23:35:38,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4575090.0, ans=0.1 2024-08-19 23:35:53,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4575090.0, ans=0.125 2024-08-19 23:35:57,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4575190.0, ans=0.125 2024-08-19 23:36:05,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4575190.0, ans=0.1 2024-08-19 23:36:34,927 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.306e+01 2.601e+01 3.029e+01 4.481e+01, threshold=5.201e+01, percent-clipped=0.0 2024-08-19 23:36:36,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4575290.0, ans=0.125 2024-08-19 23:36:44,571 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 12950, loss[loss=0.1232, beats_loss=0.009585, ecapa_loss=0.0001338, whisper_loss=0.1123, over 16549.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01055, ecapa_loss=0.0001397, whisper_loss=0.08952, over 3803436.96 frames. ], batch size: 63, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:36:55,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4575390.0, ans=0.125 2024-08-19 23:37:03,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4575390.0, ans=0.125 2024-08-19 23:37:05,770 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 21 from LS+wenet, 9 from Vox, 28 fro AS 2024-08-19 23:37:36,637 INFO [train_multi_KD3.py:845] (0/4) A total of 96 cuts. 33 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-19 23:37:36,849 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.487e-02 2024-08-19 23:37:59,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4575690.0, ans=0.07 2024-08-19 23:38:25,098 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 17 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-19 23:38:27,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4575790.0, ans=0.1 2024-08-19 23:38:34,434 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 16 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-19 23:38:41,759 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 26 from LS+wenet, 13 from Vox, 49 fro AS 2024-08-19 23:38:42,207 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.18 vs. limit=10.0 2024-08-19 23:38:42,772 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 13000, loss[loss=0.09849, beats_loss=0.01328, ecapa_loss=9.565e-05, whisper_loss=0.08425, over 22191.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01059, ecapa_loss=0.00014, whisper_loss=0.08972, over 3796088.92 frames. ], batch size: 88, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:39:01,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4575890.0, ans=0.125 2024-08-19 23:39:07,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4575990.0, ans=0.0 2024-08-19 23:39:25,214 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=15.0 2024-08-19 23:39:39,591 INFO [train_multi_KD3.py:845] (0/4) A total of 49 cuts. 14 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-19 23:39:42,016 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 26 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-19 23:40:03,966 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 17 from LS+wenet, 27 from Vox, 20 fro AS 2024-08-19 23:40:16,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4576290.0, ans=0.125 2024-08-19 23:40:19,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4576290.0, ans=0.125 2024-08-19 23:40:19,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4576290.0, ans=0.125 2024-08-19 23:40:30,070 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.284e+01 2.434e+01 2.790e+01 4.214e+01, threshold=4.868e+01, percent-clipped=0.0 2024-08-19 23:40:38,329 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 13050, loss[loss=0.08905, beats_loss=0.01108, ecapa_loss=0.0001065, whisper_loss=0.07691, over 15817.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01064, ecapa_loss=0.0001415, whisper_loss=0.08894, over 3813376.71 frames. ], batch size: 58, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:40:39,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4576390.0, ans=0.0 2024-08-19 23:41:18,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4576590.0, ans=0.0 2024-08-19 23:41:21,136 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 15 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 23:41:21,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4576590.0, ans=0.1 2024-08-19 23:41:22,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4576590.0, ans=0.125 2024-08-19 23:41:23,357 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.86 vs. limit=15.0 2024-08-19 23:41:23,926 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 28 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 23:41:35,925 WARNING [optim.py:496] (0/4) Scaling gradients by 0.06296969205141068, model_norm_threshold=48.684600830078125 2024-08-19 23:41:36,082 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.824e+04, grad_sumsq=7.824e+04, orig_rms_sq=1.000e+00 2024-08-19 23:42:15,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4576790.0, ans=0.1 2024-08-19 23:42:16,435 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 26 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-19 23:42:22,009 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 13100, loss[loss=0.09386, beats_loss=0.01093, ecapa_loss=0.0001274, whisper_loss=0.08166, over 18828.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01062, ecapa_loss=0.000142, whisper_loss=0.08893, over 3787254.11 frames. ], batch size: 77, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:42:30,003 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 26 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-19 23:42:34,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4576890.0, ans=0.0 2024-08-19 23:42:40,401 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-19 23:42:53,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4576990.0, ans=0.125 2024-08-19 23:43:31,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4577190.0, ans=0.125 2024-08-19 23:44:01,447 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 23:44:02,422 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.292e+01 2.525e+01 2.909e+01 7.731e+02, threshold=5.050e+01, percent-clipped=3.0 2024-08-19 23:44:05,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4577290.0, ans=0.0 2024-08-19 23:44:10,311 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 13150, loss[loss=0.08921, beats_loss=0.007442, ecapa_loss=0.0001583, whisper_loss=0.08018, over 16232.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01063, ecapa_loss=0.0001416, whisper_loss=0.0884, over 3796743.99 frames. ], batch size: 62, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:44:45,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4577490.0, ans=0.125 2024-08-19 23:44:51,084 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.57 vs. limit=10.0 2024-08-19 23:45:41,807 INFO [train_multi_KD3.py:845] (0/4) A total of 49 cuts. 13 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-19 23:45:43,171 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 13200, loss[loss=0.08413, beats_loss=0.01061, ecapa_loss=0.0001254, whisper_loss=0.07226, over 12963.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01062, ecapa_loss=0.0001406, whisper_loss=0.08814, over 3784600.99 frames. ], batch size: 49, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:45:50,169 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.12 vs. limit=12.0 2024-08-19 23:46:03,253 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 26 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-19 23:46:06,707 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 23:46:09,131 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 17 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 23:46:32,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4578090.0, ans=0.1 2024-08-19 23:46:50,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4578190.0, ans=0.0 2024-08-19 23:47:05,504 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.725e+01 2.290e+01 2.445e+01 2.755e+01 3.843e+01, threshold=4.889e+01, percent-clipped=0.0 2024-08-19 23:47:08,733 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.70 vs. limit=6.0 2024-08-19 23:47:12,577 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 13250, loss[loss=0.1244, beats_loss=0.006508, ecapa_loss=0.0001827, whisper_loss=0.1161, over 17330.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01055, ecapa_loss=0.000141, whisper_loss=0.08865, over 3789508.45 frames. ], batch size: 69, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:47:14,970 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.96 vs. limit=15.0 2024-08-19 23:47:30,208 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 23:47:35,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4578490.0, ans=0.0 2024-08-19 23:47:39,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4578490.0, ans=0.2 2024-08-19 23:47:46,374 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 20 from LS+wenet, 13 from Vox, 18 fro AS 2024-08-19 23:47:54,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4578590.0, ans=0.125 2024-08-19 23:47:58,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4578590.0, ans=0.2 2024-08-19 23:48:02,366 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 27 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-19 23:48:05,504 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 24 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 23:48:05,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4578590.0, ans=0.125 2024-08-19 23:48:13,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4578690.0, ans=0.125 2024-08-19 23:48:19,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4578690.0, ans=0.0 2024-08-19 23:48:43,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4578790.0, ans=0.1 2024-08-19 23:48:45,462 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 32 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-19 23:48:51,795 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 13300, loss[loss=0.08711, beats_loss=0.01161, ecapa_loss=0.0001232, whisper_loss=0.07427, over 15079.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01057, ecapa_loss=0.0001409, whisper_loss=0.08842, over 3794872.33 frames. ], batch size: 62, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:49:16,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4578990.0, ans=0.125 2024-08-19 23:49:23,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4578990.0, ans=0.0 2024-08-19 23:49:36,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=4579090.0, ans=10.0 2024-08-19 23:49:44,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4579090.0, ans=0.125 2024-08-19 23:49:49,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4579190.0, ans=0.125 2024-08-19 23:50:03,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4579290.0, ans=0.125 2024-08-19 23:50:17,583 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.697e+01 2.270e+01 2.522e+01 2.849e+01 4.114e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-19 23:50:24,317 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 13350, loss[loss=0.1108, beats_loss=0.009383, ecapa_loss=0.0001377, whisper_loss=0.1, over 16244.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01049, ecapa_loss=0.0001407, whisper_loss=0.08847, over 3780506.85 frames. ], batch size: 63, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:50:24,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4579390.0, ans=0.0 2024-08-19 23:50:38,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.18 vs. limit=15.0 2024-08-19 23:50:41,400 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.982e+00 2024-08-19 23:50:45,414 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.37 vs. limit=10.0 2024-08-19 23:51:06,634 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.64 vs. limit=10.0 2024-08-19 23:51:09,700 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 24 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-19 23:51:10,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4579590.0, ans=0.125 2024-08-19 23:51:23,142 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 28 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-19 23:51:23,376 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 23:51:58,110 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 13400, loss[loss=0.09759, beats_loss=0.009386, ecapa_loss=0.0001437, whisper_loss=0.08677, over 15773.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.0105, ecapa_loss=0.0001422, whisper_loss=0.08831, over 3770536.78 frames. ], batch size: 63, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:52:53,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4580090.0, ans=0.125 2024-08-19 23:53:08,798 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=15.0 2024-08-19 23:53:19,009 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 11 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-19 23:53:25,794 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.574e+01 2.343e+01 2.589e+01 2.932e+01 2.538e+02, threshold=5.179e+01, percent-clipped=4.0 2024-08-19 23:53:28,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4580290.0, ans=0.0 2024-08-19 23:53:33,086 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 13450, loss[loss=0.07964, beats_loss=0.01167, ecapa_loss=0.0001356, whisper_loss=0.06662, over 16846.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01053, ecapa_loss=0.0001415, whisper_loss=0.08867, over 3774581.01 frames. ], batch size: 68, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:54:03,858 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 23:54:07,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4580490.0, ans=0.0 2024-08-19 23:54:32,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4580690.0, ans=0.125 2024-08-19 23:54:32,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4580690.0, ans=0.1 2024-08-19 23:54:34,243 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.18 vs. limit=22.5 2024-08-19 23:54:41,127 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 24 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-19 23:54:41,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4580690.0, ans=0.125 2024-08-19 23:54:46,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4580690.0, ans=0.125 2024-08-19 23:55:10,924 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 13500, loss[loss=0.0852, beats_loss=0.01328, ecapa_loss=0.0001238, whisper_loss=0.07068, over 22381.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01045, ecapa_loss=0.0001411, whisper_loss=0.08939, over 3811416.09 frames. ], batch size: 90, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:55:24,645 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 17 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-19 23:55:57,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4581090.0, ans=0.2 2024-08-19 23:56:01,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4581090.0, ans=0.0 2024-08-19 23:56:01,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4581090.0, ans=0.125 2024-08-19 23:56:13,025 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 23:56:16,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4581190.0, ans=0.0 2024-08-19 23:56:20,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4581190.0, ans=0.125 2024-08-19 23:56:35,999 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.361e+01 2.616e+01 2.856e+01 5.147e+01, threshold=5.232e+01, percent-clipped=0.0 2024-08-19 23:56:39,738 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 18 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 23:56:41,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4581390.0, ans=0.0 2024-08-19 23:56:43,171 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 13550, loss[loss=0.08064, beats_loss=0.01111, ecapa_loss=0.0001345, whisper_loss=0.06819, over 15055.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01047, ecapa_loss=0.0001396, whisper_loss=0.08945, over 3824901.16 frames. ], batch size: 61, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:57:03,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4581490.0, ans=0.125 2024-08-19 23:57:09,264 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 20 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-19 23:57:33,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4581590.0, ans=0.1 2024-08-19 23:57:33,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4581590.0, ans=0.0 2024-08-19 23:57:35,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4581590.0, ans=0.125 2024-08-19 23:57:42,826 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.24 vs. limit=15.0 2024-08-19 23:57:48,231 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=15.0 2024-08-19 23:57:57,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4581790.0, ans=0.1 2024-08-19 23:58:06,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4581790.0, ans=0.0 2024-08-19 23:58:17,086 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 13600, loss[loss=0.08315, beats_loss=0.008469, ecapa_loss=0.0001336, whisper_loss=0.07335, over 13419.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01042, ecapa_loss=0.0001407, whisper_loss=0.08909, over 3804542.90 frames. ], batch size: 51, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:58:21,720 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 22 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-19 23:58:42,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4581990.0, ans=0.1 2024-08-19 23:58:53,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4582090.0, ans=0.0 2024-08-19 23:58:59,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4582090.0, ans=0.0 2024-08-19 23:59:10,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4582190.0, ans=0.0 2024-08-19 23:59:29,572 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 29 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-19 23:59:39,900 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.691e+01 2.261e+01 2.439e+01 2.757e+01 6.326e+01, threshold=4.878e+01, percent-clipped=1.0 2024-08-19 23:59:47,344 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 13650, loss[loss=0.1257, beats_loss=0.009268, ecapa_loss=0.0001214, whisper_loss=0.1152, over 16167.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0103, ecapa_loss=0.0001404, whisper_loss=0.08982, over 3760852.32 frames. ], batch size: 61, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-19 23:59:49,193 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 26 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-19 23:59:57,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4582390.0, ans=0.0 2024-08-19 23:59:59,910 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.20 vs. limit=22.5 2024-08-20 00:00:23,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4582590.0, ans=0.2 2024-08-20 00:00:40,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4582590.0, ans=0.0 2024-08-20 00:01:02,521 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 29 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 00:01:13,957 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.80 vs. limit=15.0 2024-08-20 00:01:14,969 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 15 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 00:01:19,758 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.40 vs. limit=8.0 2024-08-20 00:01:20,256 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 13700, loss[loss=0.1063, beats_loss=0.01177, ecapa_loss=0.0001183, whisper_loss=0.09334, over 22987.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01037, ecapa_loss=0.0001407, whisper_loss=0.08935, over 3764827.44 frames. ], batch size: 90, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:01:27,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4582890.0, ans=0.125 2024-08-20 00:01:29,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4582890.0, ans=0.125 2024-08-20 00:01:36,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4582990.0, ans=0.0 2024-08-20 00:01:41,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4582990.0, ans=0.125 2024-08-20 00:01:44,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4582990.0, ans=0.0 2024-08-20 00:01:48,083 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 31 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-20 00:02:00,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4583090.0, ans=0.0 2024-08-20 00:02:03,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4583090.0, ans=0.125 2024-08-20 00:02:14,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4583090.0, ans=0.1 2024-08-20 00:02:20,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4583190.0, ans=0.0 2024-08-20 00:02:22,925 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 21 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 00:02:34,338 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 37 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 00:02:46,867 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.351e+01 2.599e+01 2.817e+01 2.023e+02, threshold=5.198e+01, percent-clipped=1.0 2024-08-20 00:02:54,316 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 13750, loss[loss=0.115, beats_loss=0.009673, ecapa_loss=0.0001331, whisper_loss=0.104, over 20941.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01044, ecapa_loss=0.0001409, whisper_loss=0.08957, over 3767138.95 frames. ], batch size: 79, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:02:54,562 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 28 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 00:02:54,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4583390.0, ans=0.125 2024-08-20 00:03:03,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4583390.0, ans=0.125 2024-08-20 00:03:19,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4583490.0, ans=0.09899494936611666 2024-08-20 00:03:19,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4583490.0, ans=0.05 2024-08-20 00:03:21,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4583490.0, ans=0.0 2024-08-20 00:03:30,066 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 20 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-20 00:03:45,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4583590.0, ans=0.125 2024-08-20 00:03:45,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4583590.0, ans=0.0 2024-08-20 00:03:53,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4583690.0, ans=0.1 2024-08-20 00:04:01,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4583690.0, ans=0.1 2024-08-20 00:04:14,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4583790.0, ans=0.125 2024-08-20 00:04:25,846 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.70 vs. limit=22.5 2024-08-20 00:04:27,656 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 13800, loss[loss=0.1079, beats_loss=0.01042, ecapa_loss=0.0001364, whisper_loss=0.09612, over 23314.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01051, ecapa_loss=0.0001417, whisper_loss=0.08859, over 3785929.49 frames. ], batch size: 92, lr: 1.96e-03, grad_scale: 1.152921504606847e+18 2024-08-20 00:04:50,142 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2024-08-20 00:05:11,270 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 28 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-20 00:05:11,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=4584090.0, ans=15.0 2024-08-20 00:05:26,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4584190.0, ans=10.0 2024-08-20 00:05:35,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4584190.0, ans=0.125 2024-08-20 00:05:51,492 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.269e+01 2.538e+01 2.800e+01 5.388e+01, threshold=5.076e+01, percent-clipped=1.0 2024-08-20 00:05:57,936 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 13850, loss[loss=0.08057, beats_loss=0.01255, ecapa_loss=0.0001352, whisper_loss=0.06667, over 21672.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01047, ecapa_loss=0.0001407, whisper_loss=0.08895, over 3783601.94 frames. ], batch size: 89, lr: 1.96e-03, grad_scale: 1.152921504606847e+18 2024-08-20 00:05:58,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4584390.0, ans=0.125 2024-08-20 00:05:59,800 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 16 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-20 00:06:14,878 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.49 vs. limit=22.5 2024-08-20 00:06:27,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4584490.0, ans=0.0 2024-08-20 00:06:46,403 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-20 00:06:57,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4584690.0, ans=0.035 2024-08-20 00:07:15,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4584790.0, ans=0.1 2024-08-20 00:07:18,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4584790.0, ans=0.125 2024-08-20 00:07:27,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4584790.0, ans=0.0 2024-08-20 00:07:30,772 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 13900, loss[loss=0.08636, beats_loss=0.01582, ecapa_loss=0.0001016, whisper_loss=0.06952, over 22488.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01056, ecapa_loss=0.0001407, whisper_loss=0.089, over 3790619.33 frames. ], batch size: 92, lr: 1.96e-03, grad_scale: 1.152921504606847e+18 2024-08-20 00:08:22,512 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 23 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-20 00:08:31,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4585190.0, ans=0.5 2024-08-20 00:08:31,947 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.93 vs. limit=6.0 2024-08-20 00:08:54,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4585290.0, ans=0.0 2024-08-20 00:08:56,349 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.301e+01 2.533e+01 2.957e+01 6.862e+01, threshold=5.066e+01, percent-clipped=1.0 2024-08-20 00:09:03,981 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 13950, loss[loss=0.1065, beats_loss=0.01162, ecapa_loss=0.0001181, whisper_loss=0.09374, over 24205.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01066, ecapa_loss=0.0001403, whisper_loss=0.08858, over 3804830.24 frames. ], batch size: 97, lr: 1.96e-03, grad_scale: 1.152921504606847e+18 2024-08-20 00:09:25,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4585490.0, ans=0.125 2024-08-20 00:09:43,757 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 29 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-20 00:10:00,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4585690.0, ans=0.125 2024-08-20 00:10:09,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4585690.0, ans=0.025 2024-08-20 00:10:18,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4585790.0, ans=0.0 2024-08-20 00:10:40,044 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 14000, loss[loss=0.1099, beats_loss=0.01209, ecapa_loss=0.0001088, whisper_loss=0.09668, over 20996.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01066, ecapa_loss=0.0001401, whisper_loss=0.08858, over 3790810.52 frames. ], batch size: 80, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:10:43,392 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 00:10:51,027 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.01 vs. limit=6.0 2024-08-20 00:10:52,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4585890.0, ans=0.0 2024-08-20 00:11:17,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4586090.0, ans=0.0 2024-08-20 00:11:25,712 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 36 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-20 00:11:35,287 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 17 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-20 00:12:07,574 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.217e+01 2.438e+01 2.736e+01 1.084e+02, threshold=4.877e+01, percent-clipped=1.0 2024-08-20 00:12:12,853 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.43 vs. limit=22.5 2024-08-20 00:12:15,035 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 14050, loss[loss=0.09298, beats_loss=0.01155, ecapa_loss=0.000128, whisper_loss=0.08015, over 18869.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01064, ecapa_loss=0.0001402, whisper_loss=0.08867, over 3783377.79 frames. ], batch size: 77, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:12:44,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4586490.0, ans=0.09899494936611666 2024-08-20 00:13:23,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4586690.0, ans=0.1 2024-08-20 00:13:24,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4586690.0, ans=0.1 2024-08-20 00:13:31,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4586790.0, ans=0.0 2024-08-20 00:13:49,161 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 14100, loss[loss=0.09236, beats_loss=0.01182, ecapa_loss=0.0001125, whisper_loss=0.07942, over 19880.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01058, ecapa_loss=0.0001401, whisper_loss=0.08861, over 3751117.59 frames. ], batch size: 77, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:13:51,273 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 17 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-20 00:14:04,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4586890.0, ans=0.0 2024-08-20 00:14:06,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4586990.0, ans=0.0 2024-08-20 00:14:14,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4586990.0, ans=0.2 2024-08-20 00:14:21,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4586990.0, ans=0.2 2024-08-20 00:14:23,494 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 17 from LS+wenet, 26 from Vox, 47 fro AS 2024-08-20 00:14:27,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4587090.0, ans=0.2 2024-08-20 00:14:40,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4587090.0, ans=0.125 2024-08-20 00:14:41,820 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 27 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-20 00:15:04,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4587290.0, ans=0.0 2024-08-20 00:15:17,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4587290.0, ans=0.125 2024-08-20 00:15:18,590 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.279e+01 2.555e+01 2.827e+01 5.250e+01, threshold=5.111e+01, percent-clipped=1.0 2024-08-20 00:15:24,374 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 14150, loss[loss=0.1082, beats_loss=0.01064, ecapa_loss=0.0001218, whisper_loss=0.09636, over 23217.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01052, ecapa_loss=0.0001405, whisper_loss=0.08913, over 3760494.87 frames. ], batch size: 91, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:16:03,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4587590.0, ans=0.0 2024-08-20 00:16:11,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4587590.0, ans=0.0 2024-08-20 00:16:21,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4587690.0, ans=0.125 2024-08-20 00:16:25,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4587690.0, ans=0.07 2024-08-20 00:16:33,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4587690.0, ans=0.0 2024-08-20 00:16:36,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4587690.0, ans=0.04949747468305833 2024-08-20 00:16:49,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4587790.0, ans=0.125 2024-08-20 00:16:56,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4587790.0, ans=0.0 2024-08-20 00:16:59,920 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 14200, loss[loss=0.09802, beats_loss=0.01346, ecapa_loss=9.687e-05, whisper_loss=0.08359, over 14677.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01043, ecapa_loss=0.000141, whisper_loss=0.08955, over 3757093.58 frames. ], batch size: 56, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:17:18,464 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 27 from LS+wenet, 9 from Vox, 15 fro AS 2024-08-20 00:17:20,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4587990.0, ans=0.125 2024-08-20 00:17:34,751 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 00:17:51,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4588090.0, ans=0.05 2024-08-20 00:18:24,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4588290.0, ans=0.0 2024-08-20 00:18:26,117 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 24 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 00:18:27,135 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.209e+01 2.487e+01 2.835e+01 4.985e+01, threshold=4.974e+01, percent-clipped=0.0 2024-08-20 00:18:33,061 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 14250, loss[loss=0.1242, beats_loss=0.009418, ecapa_loss=0.0001523, whisper_loss=0.1133, over 22333.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01035, ecapa_loss=0.0001412, whisper_loss=0.0908, over 3795299.29 frames. ], batch size: 88, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:18:47,989 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 19 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-20 00:18:48,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4588390.0, ans=0.2 2024-08-20 00:18:51,837 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 18 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-20 00:19:20,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4588590.0, ans=0.125 2024-08-20 00:19:20,899 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-08-20 00:19:23,928 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 20 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 00:19:31,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4588690.0, ans=0.04949747468305833 2024-08-20 00:19:41,487 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-20 00:20:06,757 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 14300, loss[loss=0.08009, beats_loss=0.01065, ecapa_loss=0.0001548, whisper_loss=0.06789, over 11970.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01031, ecapa_loss=0.0001402, whisper_loss=0.09091, over 3759656.86 frames. ], batch size: 49, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:20:24,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4588990.0, ans=0.125 2024-08-20 00:20:31,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4588990.0, ans=0.125 2024-08-20 00:20:35,304 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 29 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 00:20:46,151 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.28 vs. limit=22.5 2024-08-20 00:21:24,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4589290.0, ans=0.125 2024-08-20 00:21:29,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4589290.0, ans=0.125 2024-08-20 00:21:36,002 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.288e+01 2.504e+01 2.843e+01 5.964e+01, threshold=5.008e+01, percent-clipped=1.0 2024-08-20 00:21:36,267 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 26 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 00:21:42,163 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 14350, loss[loss=0.07853, beats_loss=0.01447, ecapa_loss=0.0001345, whisper_loss=0.06272, over 14840.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01022, ecapa_loss=0.0001415, whisper_loss=0.09105, over 3719155.21 frames. ], batch size: 60, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:21:49,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4589390.0, ans=0.125 2024-08-20 00:22:55,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4589790.0, ans=0.125 2024-08-20 00:23:02,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=4589790.0, ans=12.0 2024-08-20 00:23:03,452 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 00:23:14,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4589790.0, ans=0.0 2024-08-20 00:23:16,258 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=15.0 2024-08-20 00:23:16,988 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 14400, loss[loss=0.08844, beats_loss=0.006709, ecapa_loss=0.0002046, whisper_loss=0.07969, over 12230.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01022, ecapa_loss=0.0001412, whisper_loss=0.0912, over 3739877.27 frames. ], batch size: 51, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:24:11,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4590190.0, ans=0.0 2024-08-20 00:24:15,745 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-20 00:24:22,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4590190.0, ans=0.125 2024-08-20 00:24:28,173 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.56 vs. limit=12.0 2024-08-20 00:24:41,311 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.268e+01 2.508e+01 2.742e+01 3.367e+01, threshold=5.015e+01, percent-clipped=0.0 2024-08-20 00:24:48,101 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 14450, loss[loss=0.07206, beats_loss=0.01264, ecapa_loss=9.654e-05, whisper_loss=0.05846, over 17635.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01027, ecapa_loss=0.0001401, whisper_loss=0.09065, over 3717989.70 frames. ], batch size: 69, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:24:57,768 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 24 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-20 00:25:05,946 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.895e+00 2024-08-20 00:25:13,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4590490.0, ans=0.125 2024-08-20 00:25:19,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4590490.0, ans=0.125 2024-08-20 00:25:26,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4590590.0, ans=0.0 2024-08-20 00:25:43,695 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 25 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 00:25:48,939 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-20 00:26:02,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4590690.0, ans=0.2 2024-08-20 00:26:09,719 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 16 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-20 00:26:20,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4590790.0, ans=0.1 2024-08-20 00:26:24,558 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 14500, loss[loss=0.09989, beats_loss=0.008973, ecapa_loss=0.0001142, whisper_loss=0.08977, over 20582.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01028, ecapa_loss=0.0001406, whisper_loss=0.08983, over 3718677.46 frames. ], batch size: 80, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:26:25,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4590890.0, ans=0.1 2024-08-20 00:26:32,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4590890.0, ans=0.0 2024-08-20 00:26:53,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4590990.0, ans=0.025 2024-08-20 00:27:02,389 WARNING [optim.py:496] (0/4) Scaling gradients by 0.034884583204984665, model_norm_threshold=50.15473556518555 2024-08-20 00:27:02,546 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.43, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.848e+05, grad_sumsq=8.328e+07, orig_rms_sq=1.062e-02 2024-08-20 00:27:15,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4591090.0, ans=0.035 2024-08-20 00:27:33,072 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 18 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 00:27:52,609 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.301e+01 2.496e+01 2.802e+01 1.438e+03, threshold=4.992e+01, percent-clipped=1.0 2024-08-20 00:27:59,125 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 14550, loss[loss=0.1157, beats_loss=0.01023, ecapa_loss=0.0001436, whisper_loss=0.104, over 17420.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01033, ecapa_loss=0.0001398, whisper_loss=0.0894, over 3723771.30 frames. ], batch size: 65, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:28:03,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4591390.0, ans=0.2 2024-08-20 00:28:05,028 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 25 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-20 00:28:16,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4591490.0, ans=0.05 2024-08-20 00:28:22,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4591490.0, ans=0.025 2024-08-20 00:28:27,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4591490.0, ans=0.125 2024-08-20 00:28:38,876 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 26 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-20 00:28:40,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4591590.0, ans=0.125 2024-08-20 00:28:42,409 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 22 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 00:29:04,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4591690.0, ans=0.125 2024-08-20 00:29:18,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4591790.0, ans=0.0 2024-08-20 00:29:20,516 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.38 vs. limit=15.0 2024-08-20 00:29:24,364 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2024-08-20 00:29:33,051 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 14600, loss[loss=0.1247, beats_loss=0.008092, ecapa_loss=0.0001639, whisper_loss=0.1149, over 21326.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01026, ecapa_loss=0.0001413, whisper_loss=0.08913, over 3726803.33 frames. ], batch size: 84, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:30:17,741 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 23 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-20 00:30:38,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4592190.0, ans=0.125 2024-08-20 00:30:42,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4592190.0, ans=0.125 2024-08-20 00:30:52,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4592290.0, ans=0.1 2024-08-20 00:31:02,116 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.406e+01 2.621e+01 2.917e+01 4.385e+01, threshold=5.242e+01, percent-clipped=0.0 2024-08-20 00:31:02,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4592290.0, ans=0.125 2024-08-20 00:31:02,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4592290.0, ans=0.0 2024-08-20 00:31:07,357 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 14650, loss[loss=0.1136, beats_loss=0.0103, ecapa_loss=0.000138, whisper_loss=0.1019, over 15807.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01035, ecapa_loss=0.00014, whisper_loss=0.08933, over 3754499.04 frames. ], batch size: 61, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:31:17,233 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 25 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-20 00:31:20,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4592390.0, ans=0.1 2024-08-20 00:31:30,186 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 00:31:33,572 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 26 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-20 00:32:23,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4592790.0, ans=0.0 2024-08-20 00:32:29,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4592790.0, ans=0.0 2024-08-20 00:32:37,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4592790.0, ans=0.1 2024-08-20 00:32:38,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4592790.0, ans=0.2 2024-08-20 00:32:41,428 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 14700, loss[loss=0.06294, beats_loss=0.01146, ecapa_loss=0.0001291, whisper_loss=0.05019, over 12973.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01041, ecapa_loss=0.0001408, whisper_loss=0.08953, over 3773747.63 frames. ], batch size: 53, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:32:42,883 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.52 vs. limit=15.0 2024-08-20 00:32:55,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4592890.0, ans=0.125 2024-08-20 00:33:17,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4593090.0, ans=0.125 2024-08-20 00:33:44,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4593190.0, ans=0.0 2024-08-20 00:33:45,256 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.33 vs. limit=15.0 2024-08-20 00:33:54,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4593190.0, ans=0.1 2024-08-20 00:34:12,264 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.392e+01 2.545e+01 2.884e+01 3.743e+01, threshold=5.090e+01, percent-clipped=0.0 2024-08-20 00:34:17,491 INFO [train_multi_KD3.py:1117] (0/4) Epoch 31, batch 14750, loss[loss=0.09569, beats_loss=0.0106, ecapa_loss=0.0001239, whisper_loss=0.08385, over 16537.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01041, ecapa_loss=0.0001415, whisper_loss=0.08987, over 3781543.03 frames. ], batch size: 64, lr: 1.96e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:34:19,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4593390.0, ans=0.2 2024-08-20 00:34:39,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4593490.0, ans=0.125 2024-08-20 00:34:39,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4593490.0, ans=0.125 2024-08-20 00:35:21,657 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 24 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-20 00:35:32,238 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-31.pt 2024-08-20 00:36:08,654 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 0, loss[loss=0.1084, beats_loss=0.01048, ecapa_loss=0.0001127, whisper_loss=0.09677, over 15202.00 frames. ], tot_loss[loss=0.1084, beats_loss=0.01048, ecapa_loss=0.0001127, whisper_loss=0.09677, over 15202.00 frames. ], batch size: 57, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:36:08,655 INFO [train_multi_KD3.py:1140] (0/4) Computing validation loss 2024-08-20 00:36:23,269 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.0648, 4.3938, 4.3119, 3.9783], device='cuda:0') 2024-08-20 00:36:43,382 INFO [train_multi_KD3.py:1150] (0/4) Epoch 32, validation on ASR_libri: loss=0.2539, beats_loss=0, ecapa_loss=0.0005131, whisper_loss=0.2488, over 931116.00 frames. 2024-08-20 00:37:05,822 INFO [train_multi_KD3.py:1150] (0/4) Epoch 32, validation on SV_voxceleb1: loss=0.004, beats_loss=0, ecapa_loss=0.0004, whisper_loss=0, over 944235.00 frames. 2024-08-20 00:38:39,895 INFO [train_multi_KD3.py:1150] (0/4) Epoch 32, validation on AT_audioset: loss=0.02299, beats_loss=0.02299, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 00:38:39,899 INFO [train_multi_KD3.py:1156] (0/4) Maximum memory allocated so far is 32775MB 2024-08-20 00:38:41,501 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 22 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-20 00:38:44,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4593800.0, ans=0.125 2024-08-20 00:39:05,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4593900.0, ans=0.125 2024-08-20 00:39:16,353 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 00:39:24,288 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2024-08-20 00:39:46,445 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.41 vs. limit=22.5 2024-08-20 00:39:51,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4594000.0, ans=0.2 2024-08-20 00:39:55,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4594100.0, ans=0.0 2024-08-20 00:40:02,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4594100.0, ans=0.2 2024-08-20 00:40:10,368 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.49 vs. limit=15.0 2024-08-20 00:40:39,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4594300.0, ans=0.1 2024-08-20 00:40:40,565 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 50, loss[loss=0.1142, beats_loss=0.007661, ecapa_loss=0.0001448, whisper_loss=0.1051, over 18326.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.009346, ecapa_loss=0.00015, whisper_loss=0.09157, over 886647.44 frames. ], batch size: 68, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:40:43,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4594300.0, ans=0.0 2024-08-20 00:40:55,063 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.478e+01 2.729e+01 3.043e+01 3.966e+01, threshold=5.458e+01, percent-clipped=0.0 2024-08-20 00:40:58,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4594300.0, ans=0.0 2024-08-20 00:41:39,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4594500.0, ans=0.125 2024-08-20 00:41:48,923 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 16 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-20 00:41:53,520 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.04 vs. limit=15.0 2024-08-20 00:42:31,794 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 19 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 00:42:34,763 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.36 vs. limit=10.0 2024-08-20 00:42:38,540 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 100, loss[loss=0.09713, beats_loss=0.008634, ecapa_loss=0.0001433, whisper_loss=0.08706, over 18786.00 frames. ], tot_loss[loss=0.0995, beats_loss=0.009203, ecapa_loss=0.000145, whisper_loss=0.08885, over 1532557.26 frames. ], batch size: 75, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:42:44,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4594800.0, ans=0.04949747468305833 2024-08-20 00:42:54,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4594800.0, ans=0.125 2024-08-20 00:42:54,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4594800.0, ans=0.125 2024-08-20 00:43:01,166 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-08-20 00:43:07,500 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 38 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 00:43:19,338 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 21 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 00:43:30,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4595000.0, ans=0.0 2024-08-20 00:43:45,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4595100.0, ans=0.125 2024-08-20 00:43:50,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4595100.0, ans=0.2 2024-08-20 00:44:01,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4595100.0, ans=0.125 2024-08-20 00:44:29,837 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 16 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-20 00:44:30,539 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=15.0 2024-08-20 00:44:33,729 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.91 vs. limit=10.0 2024-08-20 00:44:37,713 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 150, loss[loss=0.1016, beats_loss=0.01136, ecapa_loss=0.0001263, whisper_loss=0.08902, over 17785.00 frames. ], tot_loss[loss=0.09963, beats_loss=0.009186, ecapa_loss=0.0001441, whisper_loss=0.089, over 2002154.00 frames. ], batch size: 69, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:44:40,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4595300.0, ans=0.125 2024-08-20 00:44:50,016 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.031e+01 2.529e+01 2.741e+01 3.091e+01 3.915e+01, threshold=5.483e+01, percent-clipped=0.0 2024-08-20 00:45:13,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4595400.0, ans=0.0 2024-08-20 00:45:20,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4595500.0, ans=0.125 2024-08-20 00:45:23,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4595500.0, ans=0.125 2024-08-20 00:45:40,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4595600.0, ans=0.07 2024-08-20 00:46:11,243 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 20 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-20 00:46:12,732 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 200, loss[loss=0.08511, beats_loss=0.01021, ecapa_loss=0.0001544, whisper_loss=0.07335, over 19910.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.0094, ecapa_loss=0.0001424, whisper_loss=0.08928, over 2361333.64 frames. ], batch size: 80, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:46:18,982 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.329e+00 2024-08-20 00:46:45,628 WARNING [optim.py:496] (0/4) Scaling gradients by 0.03673094883561134, model_norm_threshold=54.82755661010742 2024-08-20 00:46:45,786 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.205e+05, grad_sumsq=3.205e+05, orig_rms_sq=1.000e+00 2024-08-20 00:46:49,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4596000.0, ans=0.2 2024-08-20 00:46:54,476 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 22 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 00:47:02,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4596000.0, ans=0.0 2024-08-20 00:47:06,651 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.387e-02 2024-08-20 00:47:16,718 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 31 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-20 00:47:20,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4596100.0, ans=0.125 2024-08-20 00:47:43,267 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 250, loss[loss=0.1196, beats_loss=0.009461, ecapa_loss=0.000122, whisper_loss=0.1089, over 21858.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.009795, ecapa_loss=0.0001419, whisper_loss=0.08905, over 2643572.44 frames. ], batch size: 83, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:47:43,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4596300.0, ans=0.125 2024-08-20 00:47:53,496 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.311e+01 2.593e+01 2.981e+01 1.493e+03, threshold=5.185e+01, percent-clipped=1.0 2024-08-20 00:48:02,699 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 17 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 00:48:04,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4596400.0, ans=0.125 2024-08-20 00:48:23,372 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 00:48:30,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4596500.0, ans=0.125 2024-08-20 00:48:35,416 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-20 00:48:52,667 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 20 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-20 00:48:57,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4596700.0, ans=0.015 2024-08-20 00:48:57,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4596700.0, ans=0.0 2024-08-20 00:49:09,724 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 300, loss[loss=0.1086, beats_loss=0.009569, ecapa_loss=0.0001378, whisper_loss=0.09766, over 18730.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.009994, ecapa_loss=0.0001396, whisper_loss=0.08902, over 2868374.86 frames. ], batch size: 70, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:49:25,994 WARNING [optim.py:496] (0/4) Scaling gradients by 0.03387049213051796, model_norm_threshold=51.854286193847656 2024-08-20 00:49:26,148 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.005e+05, grad_sumsq=9.116e+04, orig_rms_sq=3.297e+00 2024-08-20 00:49:35,039 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 26 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-20 00:50:10,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4597100.0, ans=0.2 2024-08-20 00:50:12,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4597100.0, ans=0.0 2024-08-20 00:50:14,119 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 22 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-20 00:50:37,086 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 350, loss[loss=0.09599, beats_loss=0.01239, ecapa_loss=0.0001381, whisper_loss=0.08222, over 19725.00 frames. ], tot_loss[loss=0.09995, beats_loss=0.01025, ecapa_loss=0.0001393, whisper_loss=0.08831, over 3077517.08 frames. ], batch size: 81, lr: 1.93e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:50:47,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4597300.0, ans=0.2 2024-08-20 00:50:48,020 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.225e+01 2.468e+01 2.778e+01 1.531e+03, threshold=4.937e+01, percent-clipped=2.0 2024-08-20 00:50:48,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4597300.0, ans=0.125 2024-08-20 00:50:57,376 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.96 vs. limit=15.0 2024-08-20 00:51:20,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4597500.0, ans=0.125 2024-08-20 00:51:37,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4597600.0, ans=0.95 2024-08-20 00:51:56,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4597700.0, ans=0.125 2024-08-20 00:52:04,905 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 400, loss[loss=0.09777, beats_loss=0.009852, ecapa_loss=0.0001517, whisper_loss=0.08641, over 22029.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01029, ecapa_loss=0.0001386, whisper_loss=0.0885, over 3234980.44 frames. ], batch size: 91, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:52:23,096 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 32 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-20 00:52:39,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4598000.0, ans=0.0 2024-08-20 00:52:51,350 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 32 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-20 00:52:57,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4598100.0, ans=0.125 2024-08-20 00:53:13,822 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 21 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 00:53:35,338 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 450, loss[loss=0.09113, beats_loss=0.00998, ecapa_loss=0.0001284, whisper_loss=0.07987, over 17897.00 frames. ], tot_loss[loss=0.09965, beats_loss=0.01033, ecapa_loss=0.0001384, whisper_loss=0.08794, over 3344565.82 frames. ], batch size: 71, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:53:42,073 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 24 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-20 00:53:45,678 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.300e+01 2.526e+01 2.780e+01 3.592e+01, threshold=5.052e+01, percent-clipped=0.0 2024-08-20 00:54:06,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4598400.0, ans=0.125 2024-08-20 00:54:16,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4598500.0, ans=0.0 2024-08-20 00:54:52,031 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.06 vs. limit=22.5 2024-08-20 00:54:53,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4598700.0, ans=0.0 2024-08-20 00:55:01,777 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 500, loss[loss=0.08865, beats_loss=0.009776, ecapa_loss=0.0001093, whisper_loss=0.07779, over 14095.00 frames. ], tot_loss[loss=0.09946, beats_loss=0.01032, ecapa_loss=0.0001384, whisper_loss=0.08776, over 3415416.34 frames. ], batch size: 51, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:55:11,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4598800.0, ans=0.0 2024-08-20 00:55:18,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4598900.0, ans=0.2 2024-08-20 00:55:41,117 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 00:56:13,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4599200.0, ans=0.0 2024-08-20 00:56:26,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4599200.0, ans=0.125 2024-08-20 00:56:27,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4599200.0, ans=0.125 2024-08-20 00:56:31,181 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 550, loss[loss=0.1182, beats_loss=0.009285, ecapa_loss=0.0001527, whisper_loss=0.1074, over 22164.00 frames. ], tot_loss[loss=0.09941, beats_loss=0.01035, ecapa_loss=0.0001384, whisper_loss=0.08768, over 3450492.62 frames. ], batch size: 90, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:56:41,835 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.734e+01 2.287e+01 2.466e+01 2.719e+01 4.330e+01, threshold=4.932e+01, percent-clipped=0.0 2024-08-20 00:56:54,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4599400.0, ans=0.125 2024-08-20 00:57:02,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=4599400.0, ans=0.05 2024-08-20 00:57:23,270 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.46 vs. limit=10.0 2024-08-20 00:57:55,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4599700.0, ans=0.0 2024-08-20 00:57:59,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=4599700.0, ans=15.0 2024-08-20 00:58:02,740 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 14 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-20 00:58:03,840 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 600, loss[loss=0.08138, beats_loss=0.01258, ecapa_loss=0.0001188, whisper_loss=0.06762, over 14195.00 frames. ], tot_loss[loss=0.09936, beats_loss=0.01038, ecapa_loss=0.0001375, whisper_loss=0.08761, over 3532534.25 frames. ], batch size: 56, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:58:38,701 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-460000.pt 2024-08-20 00:58:44,415 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.49 vs. limit=10.0 2024-08-20 00:58:47,600 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 17 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-20 00:58:53,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4600000.0, ans=0.125 2024-08-20 00:59:01,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4600100.0, ans=0.125 2024-08-20 00:59:08,972 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 25 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-20 00:59:24,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4600200.0, ans=0.0 2024-08-20 00:59:30,619 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 20 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-20 00:59:35,289 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 650, loss[loss=0.08941, beats_loss=0.009143, ecapa_loss=0.0001424, whisper_loss=0.07884, over 15223.00 frames. ], tot_loss[loss=0.09924, beats_loss=0.01059, ecapa_loss=0.0001374, whisper_loss=0.08728, over 3611669.81 frames. ], batch size: 58, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 00:59:41,316 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 17 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 00:59:43,294 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 00:59:46,513 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.232e+01 2.532e+01 2.844e+01 3.570e+02, threshold=5.065e+01, percent-clipped=2.0 2024-08-20 01:00:20,697 WARNING [optim.py:496] (0/4) Scaling gradients by 0.07462587207555771, model_norm_threshold=50.64724349975586 2024-08-20 01:00:20,856 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.123e+04, grad_sumsq=9.123e+04, orig_rms_sq=1.000e+00 2024-08-20 01:00:36,965 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.09 vs. limit=15.0 2024-08-20 01:00:40,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4600600.0, ans=0.125 2024-08-20 01:00:52,785 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 20 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 01:01:05,203 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 700, loss[loss=0.1166, beats_loss=0.009905, ecapa_loss=0.0001413, whisper_loss=0.1053, over 16576.00 frames. ], tot_loss[loss=0.09959, beats_loss=0.01046, ecapa_loss=0.0001391, whisper_loss=0.08774, over 3607727.04 frames. ], batch size: 65, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 01:01:13,836 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.60 vs. limit=10.0 2024-08-20 01:01:22,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4600900.0, ans=0.125 2024-08-20 01:01:47,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4601000.0, ans=0.125 2024-08-20 01:01:56,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4601000.0, ans=0.125 2024-08-20 01:02:04,720 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 22 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-20 01:02:04,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4601100.0, ans=0.0 2024-08-20 01:02:06,305 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 01:02:10,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4601100.0, ans=0.125 2024-08-20 01:02:13,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=4601100.0, ans=0.025 2024-08-20 01:02:24,257 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 22 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-20 01:02:26,092 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 01:02:34,862 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 750, loss[loss=0.1223, beats_loss=0.0079, ecapa_loss=0.0001415, whisper_loss=0.1129, over 20423.00 frames. ], tot_loss[loss=0.0995, beats_loss=0.01045, ecapa_loss=0.0001393, whisper_loss=0.08766, over 3659845.16 frames. ], batch size: 78, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 01:02:37,016 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 24 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 01:02:45,505 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.348e+01 2.626e+01 2.965e+01 6.787e+02, threshold=5.252e+01, percent-clipped=3.0 2024-08-20 01:03:02,545 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 11 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 01:03:18,468 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 21 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-20 01:03:19,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4601500.0, ans=0.125 2024-08-20 01:03:24,924 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 14 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-20 01:03:28,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=4601600.0, ans=15.0 2024-08-20 01:03:30,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=4601600.0, ans=15.0 2024-08-20 01:03:36,939 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 24 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-20 01:03:43,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4601700.0, ans=0.1 2024-08-20 01:03:50,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4601700.0, ans=0.2 2024-08-20 01:03:54,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4601700.0, ans=0.2 2024-08-20 01:04:00,099 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 800, loss[loss=0.1216, beats_loss=0.007228, ecapa_loss=0.0001683, whisper_loss=0.1127, over 21183.00 frames. ], tot_loss[loss=0.0989, beats_loss=0.01043, ecapa_loss=0.0001396, whisper_loss=0.08708, over 3673277.65 frames. ], batch size: 84, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 01:04:09,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4601800.0, ans=0.0 2024-08-20 01:04:13,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4601800.0, ans=0.125 2024-08-20 01:04:22,274 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.72 vs. limit=10.0 2024-08-20 01:04:41,775 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 36 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-20 01:04:41,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4602000.0, ans=0.0 2024-08-20 01:04:49,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4602000.0, ans=0.1 2024-08-20 01:05:08,134 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 23 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 01:05:13,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4602200.0, ans=0.2 2024-08-20 01:05:16,725 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 01:05:26,414 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 850, loss[loss=0.08679, beats_loss=0.009043, ecapa_loss=0.0001679, whisper_loss=0.07606, over 16674.00 frames. ], tot_loss[loss=0.09955, beats_loss=0.01033, ecapa_loss=0.0001399, whisper_loss=0.08782, over 3680644.93 frames. ], batch size: 69, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 01:05:27,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4602300.0, ans=0.0 2024-08-20 01:05:37,196 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.270e+01 2.498e+01 2.868e+01 4.208e+01, threshold=4.997e+01, percent-clipped=0.0 2024-08-20 01:05:44,922 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.54 vs. limit=22.5 2024-08-20 01:05:51,250 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 28 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-20 01:06:09,778 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 19 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-20 01:06:17,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4602600.0, ans=0.125 2024-08-20 01:06:20,220 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 20 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-20 01:06:20,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4602600.0, ans=0.0 2024-08-20 01:06:34,191 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 18 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-20 01:06:43,175 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.88 vs. limit=15.0 2024-08-20 01:06:47,875 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 18 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 01:06:53,210 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 900, loss[loss=0.1, beats_loss=0.01141, ecapa_loss=0.0001091, whisper_loss=0.08753, over 23240.00 frames. ], tot_loss[loss=0.09967, beats_loss=0.01036, ecapa_loss=0.0001395, whisper_loss=0.08792, over 3675037.88 frames. ], batch size: 92, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 01:06:53,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4602800.0, ans=0.2 2024-08-20 01:07:10,749 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 22 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 01:07:23,209 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 26 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-20 01:07:23,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4602900.0, ans=0.2 2024-08-20 01:07:30,529 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=8.621e-01 2024-08-20 01:07:32,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4603000.0, ans=0.1 2024-08-20 01:07:47,588 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 16 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-20 01:07:55,079 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.45 vs. limit=15.0 2024-08-20 01:08:04,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4603200.0, ans=0.0 2024-08-20 01:08:06,661 INFO [train_multi_KD3.py:845] (0/4) A total of 49 cuts. 14 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-20 01:08:10,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4603200.0, ans=0.2 2024-08-20 01:08:19,546 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 950, loss[loss=0.08268, beats_loss=0.01248, ecapa_loss=0.0001366, whisper_loss=0.06883, over 19930.00 frames. ], tot_loss[loss=0.09897, beats_loss=0.01042, ecapa_loss=0.0001391, whisper_loss=0.08716, over 3711838.54 frames. ], batch size: 82, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:08:25,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4603300.0, ans=0.125 2024-08-20 01:08:31,791 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.373e+01 2.705e+01 3.029e+01 3.919e+02, threshold=5.410e+01, percent-clipped=3.0 2024-08-20 01:08:42,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4603400.0, ans=0.0 2024-08-20 01:09:12,072 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.30 vs. limit=15.0 2024-08-20 01:09:27,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4603700.0, ans=0.0 2024-08-20 01:09:29,234 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 24 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-20 01:09:37,939 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 18 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 01:09:46,125 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 1000, loss[loss=0.09796, beats_loss=0.009085, ecapa_loss=0.0001578, whisper_loss=0.08729, over 21201.00 frames. ], tot_loss[loss=0.09921, beats_loss=0.01039, ecapa_loss=0.0001388, whisper_loss=0.08743, over 3728865.83 frames. ], batch size: 89, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:09:50,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4603800.0, ans=0.0 2024-08-20 01:09:51,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4603800.0, ans=0.1 2024-08-20 01:10:07,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4603900.0, ans=0.125 2024-08-20 01:10:08,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4603900.0, ans=0.125 2024-08-20 01:10:19,811 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 17 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-20 01:10:37,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=4604000.0, ans=22.5 2024-08-20 01:10:54,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4604100.0, ans=0.5 2024-08-20 01:11:14,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4604200.0, ans=0.125 2024-08-20 01:11:18,638 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 1050, loss[loss=0.07112, beats_loss=0.01199, ecapa_loss=0.0001005, whisper_loss=0.05813, over 18164.00 frames. ], tot_loss[loss=0.09917, beats_loss=0.01037, ecapa_loss=0.0001375, whisper_loss=0.08743, over 3704692.73 frames. ], batch size: 70, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:11:21,053 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 19 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 01:11:32,003 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.222e+01 2.426e+01 2.735e+01 4.130e+01, threshold=4.852e+01, percent-clipped=0.0 2024-08-20 01:11:40,638 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.98 vs. limit=22.5 2024-08-20 01:11:43,509 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 28 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-20 01:11:49,322 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 24 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 01:11:49,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4604400.0, ans=0.0 2024-08-20 01:12:00,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4604500.0, ans=0.125 2024-08-20 01:12:09,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4604500.0, ans=0.0 2024-08-20 01:12:10,630 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 21 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 01:12:12,494 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 25 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 01:12:35,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4604700.0, ans=0.125 2024-08-20 01:12:44,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4604700.0, ans=0.2 2024-08-20 01:12:49,405 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 1100, loss[loss=0.09323, beats_loss=0.01464, ecapa_loss=9.889e-05, whisper_loss=0.0776, over 18004.00 frames. ], tot_loss[loss=0.09929, beats_loss=0.01029, ecapa_loss=0.0001375, whisper_loss=0.08762, over 3735774.53 frames. ], batch size: 68, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:12:57,544 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 26 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-20 01:12:59,875 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.80 vs. limit=10.0 2024-08-20 01:13:14,572 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 22 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-20 01:13:26,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4605000.0, ans=0.0 2024-08-20 01:13:28,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4605000.0, ans=0.2 2024-08-20 01:13:50,375 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-20 01:13:57,633 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 22 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 01:14:07,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4605200.0, ans=0.125 2024-08-20 01:14:15,526 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 1150, loss[loss=0.09422, beats_loss=0.01182, ecapa_loss=0.0001461, whisper_loss=0.08094, over 16149.00 frames. ], tot_loss[loss=0.09974, beats_loss=0.01037, ecapa_loss=0.0001374, whisper_loss=0.08799, over 3749641.64 frames. ], batch size: 63, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:14:15,771 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 19 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-20 01:14:17,182 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 17 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-20 01:14:24,512 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 25 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 01:14:26,272 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 14 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-20 01:14:27,509 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.314e+01 2.565e+01 2.766e+01 1.499e+02, threshold=5.130e+01, percent-clipped=2.0 2024-08-20 01:14:27,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4605300.0, ans=0.05 2024-08-20 01:14:28,178 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-08-20 01:14:32,582 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 22 from LS+wenet, 14 from Vox, 45 fro AS 2024-08-20 01:14:34,473 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 01:14:41,245 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 01:14:43,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=4605400.0, ans=10.0 2024-08-20 01:14:48,319 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.014e+01 2024-08-20 01:14:51,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4605500.0, ans=10.0 2024-08-20 01:15:11,469 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 20 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-20 01:15:18,632 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=8.377e-03 2024-08-20 01:15:22,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4605700.0, ans=0.1 2024-08-20 01:15:30,975 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2024-08-20 01:15:40,907 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 1200, loss[loss=0.102, beats_loss=0.009366, ecapa_loss=0.0001224, whisper_loss=0.09138, over 23572.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01034, ecapa_loss=0.0001373, whisper_loss=0.08865, over 3744126.85 frames. ], batch size: 88, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:15:52,283 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 26 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-20 01:16:04,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.49 vs. limit=15.0 2024-08-20 01:16:07,231 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 24 from LS+wenet, 9 from Vox, 35 fro AS 2024-08-20 01:16:38,077 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 17 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 01:16:47,624 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 13 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 01:16:49,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4606100.0, ans=0.1 2024-08-20 01:16:51,853 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.17 vs. limit=15.0 2024-08-20 01:16:56,475 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 18 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-20 01:17:15,238 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 1250, loss[loss=0.06607, beats_loss=0.01263, ecapa_loss=0.0001299, whisper_loss=0.05215, over 14409.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01035, ecapa_loss=0.0001372, whisper_loss=0.08868, over 3736283.85 frames. ], batch size: 60, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:17:32,635 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.713e+01 2.240e+01 2.537e+01 2.870e+01 6.660e+01, threshold=5.073e+01, percent-clipped=2.0 2024-08-20 01:17:44,788 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 17 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-20 01:17:49,443 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 23 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-20 01:18:05,708 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-20 01:18:20,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4606500.0, ans=0.125 2024-08-20 01:18:28,463 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=28.73 vs. limit=22.5 2024-08-20 01:18:49,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4606600.0, ans=0.0 2024-08-20 01:19:06,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4606700.0, ans=0.125 2024-08-20 01:19:08,345 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 26 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-20 01:19:13,290 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 1300, loss[loss=0.09082, beats_loss=0.01134, ecapa_loss=0.0001469, whisper_loss=0.07801, over 16009.00 frames. ], tot_loss[loss=0.09994, beats_loss=0.01037, ecapa_loss=0.0001385, whisper_loss=0.08818, over 3746895.09 frames. ], batch size: 67, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:19:18,580 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=15.0 2024-08-20 01:19:32,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4606900.0, ans=0.035 2024-08-20 01:19:32,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4606900.0, ans=10.0 2024-08-20 01:19:36,707 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 15 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 01:19:45,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4606900.0, ans=0.1 2024-08-20 01:19:45,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4606900.0, ans=0.125 2024-08-20 01:19:55,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4607000.0, ans=0.125 2024-08-20 01:20:03,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4607000.0, ans=0.5 2024-08-20 01:20:07,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4607000.0, ans=0.035 2024-08-20 01:20:28,911 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 12 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 01:21:03,545 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 1350, loss[loss=0.1124, beats_loss=0.0103, ecapa_loss=0.0001273, whisper_loss=0.1009, over 19167.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01036, ecapa_loss=0.0001382, whisper_loss=0.08869, over 3743449.74 frames. ], batch size: 72, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:21:22,311 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.244e+01 2.406e+01 2.687e+01 4.080e+01, threshold=4.812e+01, percent-clipped=0.0 2024-08-20 01:21:39,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4607400.0, ans=0.0 2024-08-20 01:21:49,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4607500.0, ans=0.1 2024-08-20 01:21:58,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4607500.0, ans=0.125 2024-08-20 01:22:18,422 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 38 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 01:22:25,406 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 01:22:30,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4607600.0, ans=0.0 2024-08-20 01:22:45,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4607700.0, ans=0.0 2024-08-20 01:22:50,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4607700.0, ans=0.125 2024-08-20 01:22:50,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4607700.0, ans=0.125 2024-08-20 01:22:57,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4607700.0, ans=0.5 2024-08-20 01:23:07,432 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 1400, loss[loss=0.1024, beats_loss=0.008209, ecapa_loss=0.0001165, whisper_loss=0.09306, over 18094.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01033, ecapa_loss=0.0001386, whisper_loss=0.08935, over 3822416.09 frames. ], batch size: 66, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:23:07,692 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 20 from LS+wenet, 13 from Vox, 17 fro AS 2024-08-20 01:23:12,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4607800.0, ans=0.125 2024-08-20 01:23:48,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4607900.0, ans=0.0 2024-08-20 01:24:10,600 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 12 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-20 01:24:13,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4608000.0, ans=0.125 2024-08-20 01:24:13,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4608000.0, ans=0.0 2024-08-20 01:24:15,039 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 11 from LS+wenet, 11 from Vox, 40 fro AS 2024-08-20 01:24:22,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=4608100.0, ans=22.5 2024-08-20 01:25:04,795 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.24 vs. limit=6.0 2024-08-20 01:25:06,542 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0315290167927742, model_norm_threshold=48.11598205566406 2024-08-20 01:25:06,696 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.21, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.963e+05, grad_sumsq=4.963e+05, orig_rms_sq=1.000e+00 2024-08-20 01:25:09,011 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 1450, loss[loss=0.09437, beats_loss=0.01016, ecapa_loss=0.0001386, whisper_loss=0.08283, over 14292.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01034, ecapa_loss=0.000138, whisper_loss=0.08893, over 3782041.79 frames. ], batch size: 56, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:25:26,205 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.252e+01 2.461e+01 2.741e+01 1.526e+03, threshold=4.922e+01, percent-clipped=2.0 2024-08-20 01:26:16,995 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 14 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 01:26:51,456 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 20 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-20 01:27:02,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4608600.0, ans=0.1 2024-08-20 01:27:13,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4608700.0, ans=0.1 2024-08-20 01:27:19,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4608700.0, ans=0.0 2024-08-20 01:27:31,044 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 1500, loss[loss=0.0979, beats_loss=0.0116, ecapa_loss=0.0001316, whisper_loss=0.08499, over 22560.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01034, ecapa_loss=0.0001376, whisper_loss=0.08852, over 3774744.76 frames. ], batch size: 91, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:27:35,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4608800.0, ans=0.1 2024-08-20 01:27:57,353 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 01:27:59,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4608900.0, ans=0.1 2024-08-20 01:28:04,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4608900.0, ans=0.125 2024-08-20 01:28:12,451 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 18 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 01:28:19,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4609000.0, ans=0.125 2024-08-20 01:28:41,748 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 26 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-20 01:28:47,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4609100.0, ans=0.125 2024-08-20 01:28:55,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4609200.0, ans=0.125 2024-08-20 01:29:01,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4609200.0, ans=0.125 2024-08-20 01:29:13,093 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 1550, loss[loss=0.0825, beats_loss=0.0137, ecapa_loss=0.0001029, whisper_loss=0.06777, over 22217.00 frames. ], tot_loss[loss=0.09973, beats_loss=0.01038, ecapa_loss=0.000137, whisper_loss=0.08798, over 3766308.59 frames. ], batch size: 88, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:29:17,906 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.57 vs. limit=22.5 2024-08-20 01:29:27,017 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.175e+01 2.465e+01 2.675e+01 6.220e+01, threshold=4.930e+01, percent-clipped=1.0 2024-08-20 01:29:59,537 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 21 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-20 01:30:01,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4609500.0, ans=0.0 2024-08-20 01:30:12,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4609600.0, ans=0.125 2024-08-20 01:30:38,121 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 01:30:49,735 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 1600, loss[loss=0.1132, beats_loss=0.009018, ecapa_loss=0.0001311, whisper_loss=0.1029, over 22428.00 frames. ], tot_loss[loss=0.09951, beats_loss=0.01039, ecapa_loss=0.0001366, whisper_loss=0.08775, over 3763291.22 frames. ], batch size: 88, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:30:52,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4609800.0, ans=0.0 2024-08-20 01:31:06,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4609900.0, ans=0.125 2024-08-20 01:31:09,187 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 20 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-20 01:31:20,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4609900.0, ans=0.2 2024-08-20 01:31:22,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4609900.0, ans=0.0 2024-08-20 01:31:37,247 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 01:31:49,700 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-20 01:31:56,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4610100.0, ans=0.1 2024-08-20 01:31:58,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4610100.0, ans=0.95 2024-08-20 01:32:05,109 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 34 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-20 01:32:24,534 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 1650, loss[loss=0.09746, beats_loss=0.009247, ecapa_loss=0.0001394, whisper_loss=0.08682, over 21243.00 frames. ], tot_loss[loss=0.09977, beats_loss=0.01025, ecapa_loss=0.0001376, whisper_loss=0.08814, over 3736651.86 frames. ], batch size: 84, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:32:39,673 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.232e+01 2.495e+01 2.715e+01 1.384e+02, threshold=4.990e+01, percent-clipped=1.0 2024-08-20 01:32:48,781 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.11 vs. limit=22.5 2024-08-20 01:33:18,168 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.92 vs. limit=15.0 2024-08-20 01:33:38,518 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 25 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-20 01:33:51,835 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=15.0 2024-08-20 01:33:57,988 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 1700, loss[loss=0.09819, beats_loss=0.009814, ecapa_loss=0.0001157, whisper_loss=0.08722, over 16011.00 frames. ], tot_loss[loss=0.1, beats_loss=0.0103, ecapa_loss=0.0001372, whisper_loss=0.08837, over 3732782.09 frames. ], batch size: 60, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:34:03,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4610800.0, ans=0.09899494936611666 2024-08-20 01:34:19,160 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-20 01:34:34,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4611000.0, ans=0.2 2024-08-20 01:34:38,257 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 21 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-20 01:35:26,065 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 1750, loss[loss=0.09794, beats_loss=0.008742, ecapa_loss=0.0001647, whisper_loss=0.08755, over 16523.00 frames. ], tot_loss[loss=0.09998, beats_loss=0.01022, ecapa_loss=0.0001382, whisper_loss=0.08838, over 3732702.61 frames. ], batch size: 68, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:35:38,033 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.660e+01 2.241e+01 2.449e+01 2.717e+01 4.269e+01, threshold=4.898e+01, percent-clipped=0.0 2024-08-20 01:35:42,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4611400.0, ans=0.0 2024-08-20 01:35:45,362 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 01:36:01,079 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 18 from LS+wenet, 21 from Vox, 16 fro AS 2024-08-20 01:36:06,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4611500.0, ans=0.125 2024-08-20 01:36:06,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4611500.0, ans=0.2 2024-08-20 01:36:26,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4611600.0, ans=0.125 2024-08-20 01:36:27,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4611600.0, ans=0.05 2024-08-20 01:36:52,764 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 1800, loss[loss=0.1029, beats_loss=0.01115, ecapa_loss=0.0001204, whisper_loss=0.09057, over 17611.00 frames. ], tot_loss[loss=0.09983, beats_loss=0.01032, ecapa_loss=0.0001373, whisper_loss=0.08814, over 3741835.14 frames. ], batch size: 69, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:37:00,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4611800.0, ans=0.0 2024-08-20 01:37:10,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4611900.0, ans=0.125 2024-08-20 01:37:12,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4611900.0, ans=0.125 2024-08-20 01:37:33,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4612000.0, ans=0.0 2024-08-20 01:37:46,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4612100.0, ans=0.125 2024-08-20 01:37:54,965 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 26 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-20 01:38:07,356 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 22 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-20 01:38:18,890 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 1850, loss[loss=0.1022, beats_loss=0.01056, ecapa_loss=0.0001417, whisper_loss=0.09027, over 15396.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01029, ecapa_loss=0.0001371, whisper_loss=0.0885, over 3714910.95 frames. ], batch size: 62, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:38:31,306 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.236e+01 2.438e+01 2.690e+01 3.613e+01, threshold=4.877e+01, percent-clipped=0.0 2024-08-20 01:38:34,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4612400.0, ans=0.125 2024-08-20 01:38:45,121 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 28 from LS+wenet, 12 from Vox, 39 fro AS 2024-08-20 01:38:47,559 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.01 vs. limit=22.5 2024-08-20 01:38:52,148 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 14 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 01:39:07,788 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 25 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 01:39:11,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4612600.0, ans=0.125 2024-08-20 01:39:15,545 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.225e-01 2024-08-20 01:39:19,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4612600.0, ans=0.125 2024-08-20 01:39:22,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4612600.0, ans=0.0 2024-08-20 01:39:34,861 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2024-08-20 01:39:45,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4612800.0, ans=0.0 2024-08-20 01:39:47,176 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 1900, loss[loss=0.1146, beats_loss=0.01118, ecapa_loss=0.000135, whisper_loss=0.1021, over 21921.00 frames. ], tot_loss[loss=0.09961, beats_loss=0.01033, ecapa_loss=0.0001366, whisper_loss=0.08791, over 3696554.90 frames. ], batch size: 88, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:39:50,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4612800.0, ans=0.125 2024-08-20 01:40:05,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4612900.0, ans=0.125 2024-08-20 01:40:20,568 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 23 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-20 01:40:48,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4613100.0, ans=0.125 2024-08-20 01:40:50,151 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 20 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-20 01:40:52,323 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 24 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-20 01:41:14,212 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 1950, loss[loss=0.1027, beats_loss=0.008647, ecapa_loss=0.0001228, whisper_loss=0.09284, over 15491.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01041, ecapa_loss=0.0001356, whisper_loss=0.08849, over 3716831.82 frames. ], batch size: 59, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:41:26,077 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.348e+01 2.572e+01 2.844e+01 4.490e+01, threshold=5.144e+01, percent-clipped=0.0 2024-08-20 01:41:40,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4613400.0, ans=0.125 2024-08-20 01:42:01,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4613500.0, ans=0.125 2024-08-20 01:42:01,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4613500.0, ans=0.1 2024-08-20 01:42:15,376 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.11 vs. limit=12.0 2024-08-20 01:42:20,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4613600.0, ans=0.125 2024-08-20 01:42:22,408 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.77 vs. limit=15.0 2024-08-20 01:42:39,930 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 2000, loss[loss=0.09702, beats_loss=0.00989, ecapa_loss=0.0001468, whisper_loss=0.08566, over 16025.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01034, ecapa_loss=0.0001358, whisper_loss=0.08868, over 3699510.47 frames. ], batch size: 64, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:42:42,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4613800.0, ans=0.125 2024-08-20 01:42:46,586 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.82 vs. limit=22.5 2024-08-20 01:42:49,021 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 23 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-20 01:42:57,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4613900.0, ans=0.1 2024-08-20 01:43:04,428 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0779990628361702, model_norm_threshold=51.44282531738281 2024-08-20 01:43:04,586 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.291e+04, grad_sumsq=4.291e+04, orig_rms_sq=1.000e+00 2024-08-20 01:43:09,237 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.09 vs. limit=22.5 2024-08-20 01:43:23,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4614000.0, ans=0.0 2024-08-20 01:43:25,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4614000.0, ans=0.2 2024-08-20 01:43:27,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4614000.0, ans=0.1 2024-08-20 01:43:38,241 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 30 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-20 01:43:40,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4614100.0, ans=0.025 2024-08-20 01:43:45,682 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.11 vs. limit=5.0 2024-08-20 01:44:07,477 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 2050, loss[loss=0.09811, beats_loss=0.009352, ecapa_loss=0.0001072, whisper_loss=0.08769, over 16614.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01032, ecapa_loss=0.000135, whisper_loss=0.08908, over 3731982.23 frames. ], batch size: 60, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:44:14,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4614300.0, ans=0.1 2024-08-20 01:44:19,362 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.219e+01 2.452e+01 2.809e+01 6.595e+02, threshold=4.904e+01, percent-clipped=1.0 2024-08-20 01:44:25,029 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 01:44:37,068 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 32 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-20 01:44:59,322 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.08 vs. limit=15.0 2024-08-20 01:45:01,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4614600.0, ans=0.1 2024-08-20 01:45:08,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4614600.0, ans=0.1 2024-08-20 01:45:33,354 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 2100, loss[loss=0.1042, beats_loss=0.009591, ecapa_loss=0.0001141, whisper_loss=0.0935, over 16572.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01038, ecapa_loss=0.0001346, whisper_loss=0.08898, over 3751394.18 frames. ], batch size: 61, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:45:37,462 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 27 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-20 01:45:54,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4614900.0, ans=0.125 2024-08-20 01:45:59,102 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 11 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-20 01:46:02,417 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 20 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-20 01:46:37,650 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2024-08-20 01:46:40,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4615200.0, ans=0.1 2024-08-20 01:46:48,619 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.31 vs. limit=6.0 2024-08-20 01:47:00,095 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 2150, loss[loss=0.1098, beats_loss=0.01113, ecapa_loss=0.0001792, whisper_loss=0.0969, over 18727.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01033, ecapa_loss=0.0001344, whisper_loss=0.08868, over 3727326.33 frames. ], batch size: 80, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:47:01,119 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2024-08-20 01:47:05,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.94 vs. limit=15.0 2024-08-20 01:47:09,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4615300.0, ans=0.07 2024-08-20 01:47:12,058 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.717e+01 2.213e+01 2.411e+01 2.746e+01 4.203e+01, threshold=4.821e+01, percent-clipped=0.0 2024-08-20 01:47:12,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4615300.0, ans=0.1 2024-08-20 01:47:13,989 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-20 01:47:14,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4615300.0, ans=0.125 2024-08-20 01:47:30,673 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 25 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 01:47:31,308 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.18 vs. limit=22.5 2024-08-20 01:48:15,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4615700.0, ans=0.2 2024-08-20 01:48:24,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4615800.0, ans=0.125 2024-08-20 01:48:25,544 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 2200, loss[loss=0.1073, beats_loss=0.009445, ecapa_loss=0.0001415, whisper_loss=0.09647, over 22650.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01036, ecapa_loss=0.0001346, whisper_loss=0.0885, over 3716352.62 frames. ], batch size: 89, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:48:27,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4615800.0, ans=0.1 2024-08-20 01:48:46,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4615900.0, ans=10.0 2024-08-20 01:49:24,022 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2024-08-20 01:49:31,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4616200.0, ans=0.125 2024-08-20 01:49:35,319 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 34 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 01:49:45,774 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 31 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-20 01:49:50,343 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 2250, loss[loss=0.1252, beats_loss=0.009148, ecapa_loss=0.0001203, whisper_loss=0.1148, over 22446.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01026, ecapa_loss=0.000136, whisper_loss=0.09012, over 3710037.63 frames. ], batch size: 84, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:50:02,022 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.187e+01 2.427e+01 2.680e+01 3.409e+01, threshold=4.854e+01, percent-clipped=0.0 2024-08-20 01:50:09,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4616400.0, ans=10.0 2024-08-20 01:50:13,185 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 12 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-20 01:50:26,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4616500.0, ans=0.125 2024-08-20 01:50:30,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4616500.0, ans=0.0 2024-08-20 01:50:36,880 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 22 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-20 01:50:37,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4616500.0, ans=0.125 2024-08-20 01:51:15,685 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 2300, loss[loss=0.1349, beats_loss=0.009204, ecapa_loss=0.0001396, whisper_loss=0.1243, over 22448.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01027, ecapa_loss=0.0001364, whisper_loss=0.09117, over 3729923.20 frames. ], batch size: 88, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:51:39,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4616900.0, ans=0.05 2024-08-20 01:51:58,630 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 25 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-20 01:52:00,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4617000.0, ans=0.0 2024-08-20 01:52:11,929 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.61 vs. limit=22.5 2024-08-20 01:52:26,645 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 24 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 01:52:30,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4617200.0, ans=0.125 2024-08-20 01:52:39,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4617200.0, ans=0.1 2024-08-20 01:52:40,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4617200.0, ans=0.1 2024-08-20 01:52:43,083 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 2350, loss[loss=0.1108, beats_loss=0.007607, ecapa_loss=0.0001274, whisper_loss=0.102, over 14693.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01028, ecapa_loss=0.0001368, whisper_loss=0.09164, over 3787663.14 frames. ], batch size: 53, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:52:51,396 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.42 vs. limit=10.0 2024-08-20 01:52:55,223 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.315e+01 2.598e+01 2.990e+01 3.797e+01, threshold=5.197e+01, percent-clipped=0.0 2024-08-20 01:52:59,302 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 21 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 01:53:03,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4617400.0, ans=0.1 2024-08-20 01:53:17,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4617500.0, ans=0.025 2024-08-20 01:53:24,878 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 25 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-20 01:53:40,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=4617600.0, ans=15.0 2024-08-20 01:53:45,166 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 16 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 01:53:56,138 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.67 vs. limit=6.0 2024-08-20 01:54:07,165 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 2400, loss[loss=0.08082, beats_loss=0.01234, ecapa_loss=0.0001309, whisper_loss=0.06718, over 21748.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0103, ecapa_loss=0.0001383, whisper_loss=0.09105, over 3781585.81 frames. ], batch size: 88, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:54:13,546 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.47 vs. limit=15.0 2024-08-20 01:54:16,687 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 23 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-20 01:54:19,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4617800.0, ans=0.125 2024-08-20 01:54:45,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4618000.0, ans=0.125 2024-08-20 01:55:33,038 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 2450, loss[loss=0.1093, beats_loss=0.008796, ecapa_loss=0.0001659, whisper_loss=0.09882, over 20236.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01034, ecapa_loss=0.0001383, whisper_loss=0.09039, over 3777998.11 frames. ], batch size: 81, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:55:42,900 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-08-20 01:55:45,087 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.716e+01 2.204e+01 2.412e+01 2.711e+01 4.337e+02, threshold=4.825e+01, percent-clipped=1.0 2024-08-20 01:55:47,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4618300.0, ans=0.125 2024-08-20 01:56:27,366 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-20 01:56:38,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4618600.0, ans=0.0 2024-08-20 01:56:56,754 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 01:57:03,899 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 2500, loss[loss=0.1079, beats_loss=0.008564, ecapa_loss=0.0001383, whisper_loss=0.09798, over 20743.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01025, ecapa_loss=0.000139, whisper_loss=0.09085, over 3805718.43 frames. ], batch size: 81, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:57:04,216 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 15 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-20 01:57:17,172 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 33 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-20 01:57:43,323 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 13 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-20 01:57:51,683 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 34 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-20 01:57:53,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4619000.0, ans=0.125 2024-08-20 01:57:54,795 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 15 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-20 01:58:07,692 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=15.0 2024-08-20 01:58:17,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4619200.0, ans=0.1 2024-08-20 01:58:32,180 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 2550, loss[loss=0.0839, beats_loss=0.0122, ecapa_loss=0.0001167, whisper_loss=0.07053, over 15372.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01026, ecapa_loss=0.0001387, whisper_loss=0.09097, over 3782830.49 frames. ], batch size: 60, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 01:58:36,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4619300.0, ans=0.0 2024-08-20 01:58:44,471 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.624e+01 2.306e+01 2.523e+01 2.847e+01 3.512e+02, threshold=5.047e+01, percent-clipped=2.0 2024-08-20 01:58:44,667 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 25 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-20 01:59:21,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4619500.0, ans=0.0 2024-08-20 01:59:28,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4619600.0, ans=0.125 2024-08-20 01:59:35,225 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 12 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 01:59:46,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4619700.0, ans=0.125 2024-08-20 01:59:50,288 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-20 02:00:01,014 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 2600, loss[loss=0.097, beats_loss=0.00956, ecapa_loss=0.0001362, whisper_loss=0.08608, over 19566.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01034, ecapa_loss=0.0001382, whisper_loss=0.09045, over 3824270.85 frames. ], batch size: 77, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:00:17,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4619900.0, ans=0.125 2024-08-20 02:00:20,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4619900.0, ans=0.125 2024-08-20 02:00:43,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4620000.0, ans=0.2 2024-08-20 02:01:30,038 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 2650, loss[loss=0.08963, beats_loss=0.01148, ecapa_loss=0.0001339, whisper_loss=0.07681, over 22457.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01034, ecapa_loss=0.0001393, whisper_loss=0.0905, over 3822245.41 frames. ], batch size: 92, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:01:42,574 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.354e+01 2.571e+01 2.953e+01 6.961e+01, threshold=5.142e+01, percent-clipped=1.0 2024-08-20 02:02:03,946 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 33 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 02:02:07,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4620500.0, ans=0.1 2024-08-20 02:02:15,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4620500.0, ans=0.09899494936611666 2024-08-20 02:02:19,900 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 02:02:25,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4620600.0, ans=0.125 2024-08-20 02:02:46,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4620700.0, ans=0.1 2024-08-20 02:02:57,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4620800.0, ans=0.0 2024-08-20 02:02:58,543 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 2700, loss[loss=0.07785, beats_loss=0.01184, ecapa_loss=0.0001248, whisper_loss=0.06477, over 15209.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01037, ecapa_loss=0.0001398, whisper_loss=0.09009, over 3816554.63 frames. ], batch size: 60, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:03:00,750 INFO [train_multi_KD3.py:845] (0/4) A total of 96 cuts. 27 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-20 02:03:26,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4620900.0, ans=0.2 2024-08-20 02:03:29,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4620900.0, ans=0.0 2024-08-20 02:03:49,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4621100.0, ans=0.0 2024-08-20 02:03:55,691 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 22 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-20 02:04:01,008 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 22 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-20 02:04:05,420 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.68 vs. limit=15.0 2024-08-20 02:04:24,791 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 2750, loss[loss=0.104, beats_loss=0.01059, ecapa_loss=0.0001391, whisper_loss=0.09197, over 23199.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01045, ecapa_loss=0.0001401, whisper_loss=0.08914, over 3800482.88 frames. ], batch size: 95, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:04:36,897 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.283e+01 2.512e+01 2.707e+01 3.446e+01, threshold=5.024e+01, percent-clipped=0.0 2024-08-20 02:04:41,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4621400.0, ans=0.125 2024-08-20 02:05:17,796 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 18 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-20 02:05:46,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4621700.0, ans=0.0 2024-08-20 02:05:53,029 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 2800, loss[loss=0.1109, beats_loss=0.009151, ecapa_loss=0.0001216, whisper_loss=0.1005, over 18586.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01049, ecapa_loss=0.0001392, whisper_loss=0.0887, over 3789951.90 frames. ], batch size: 71, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:06:26,414 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 38 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-20 02:06:35,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4622000.0, ans=0.1 2024-08-20 02:06:53,734 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 24 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 02:06:57,723 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.52 vs. limit=15.0 2024-08-20 02:07:22,912 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 2850, loss[loss=0.08147, beats_loss=0.01185, ecapa_loss=0.000134, whisper_loss=0.06829, over 21413.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01043, ecapa_loss=0.0001391, whisper_loss=0.08989, over 3809310.43 frames. ], batch size: 87, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:07:35,619 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.251e+01 2.480e+01 2.760e+01 4.318e+01, threshold=4.959e+01, percent-clipped=0.0 2024-08-20 02:07:44,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4622400.0, ans=0.2 2024-08-20 02:07:55,201 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 32 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-20 02:08:35,723 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 26 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 02:08:41,407 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-20 02:08:41,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4622700.0, ans=0.125 2024-08-20 02:08:46,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4622700.0, ans=0.125 2024-08-20 02:08:52,970 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 2900, loss[loss=0.1001, beats_loss=0.007685, ecapa_loss=0.0001311, whisper_loss=0.09112, over 16528.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01036, ecapa_loss=0.0001396, whisper_loss=0.09032, over 3837349.99 frames. ], batch size: 63, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 02:08:53,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4622800.0, ans=0.125 2024-08-20 02:08:54,022 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.50 vs. limit=15.0 2024-08-20 02:09:11,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4622900.0, ans=0.1 2024-08-20 02:09:25,811 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 22 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-20 02:09:45,476 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 35 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 02:10:22,385 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 2950, loss[loss=0.1094, beats_loss=0.009205, ecapa_loss=0.0001452, whisper_loss=0.09875, over 20811.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01032, ecapa_loss=0.0001395, whisper_loss=0.09127, over 3866595.30 frames. ], batch size: 80, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:10:34,570 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.285e+01 2.491e+01 2.729e+01 3.693e+01, threshold=4.982e+01, percent-clipped=0.0 2024-08-20 02:10:35,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4623300.0, ans=0.125 2024-08-20 02:10:41,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4623400.0, ans=0.1 2024-08-20 02:10:43,174 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 26 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-20 02:11:01,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.73 vs. limit=22.5 2024-08-20 02:11:02,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4623500.0, ans=0.125 2024-08-20 02:11:06,162 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 21 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-20 02:11:06,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4623500.0, ans=0.125 2024-08-20 02:11:16,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4623600.0, ans=0.1 2024-08-20 02:11:17,270 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.91 vs. limit=22.5 2024-08-20 02:11:23,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4623600.0, ans=0.0 2024-08-20 02:11:36,305 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.05 vs. limit=12.0 2024-08-20 02:11:44,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4623700.0, ans=0.2 2024-08-20 02:11:48,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4623800.0, ans=0.125 2024-08-20 02:11:48,925 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 3000, loss[loss=0.08698, beats_loss=0.01298, ecapa_loss=0.0001097, whisper_loss=0.0729, over 19635.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01045, ecapa_loss=0.0001394, whisper_loss=0.09007, over 3848598.50 frames. ], batch size: 78, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:11:48,926 INFO [train_multi_KD3.py:1140] (0/4) Computing validation loss 2024-08-20 02:12:25,559 INFO [train_multi_KD3.py:1150] (0/4) Epoch 32, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.000511, whisper_loss=0.249, over 931116.00 frames. 2024-08-20 02:12:46,558 INFO [train_multi_KD3.py:1150] (0/4) Epoch 32, validation on SV_voxceleb1: loss=0.003941, beats_loss=0, ecapa_loss=0.0003941, whisper_loss=0, over 944235.00 frames. 2024-08-20 02:13:02,625 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1690, 3.1236, 3.2812, 3.0281], device='cuda:0') 2024-08-20 02:13:13,170 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.5695, 3.3906, 3.9533, 3.6952], device='cuda:0') 2024-08-20 02:14:20,865 INFO [train_multi_KD3.py:1150] (0/4) Epoch 32, validation on AT_audioset: loss=0.02293, beats_loss=0.02293, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 02:14:20,869 INFO [train_multi_KD3.py:1156] (0/4) Maximum memory allocated so far is 32775MB 2024-08-20 02:14:43,032 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.30 vs. limit=22.5 2024-08-20 02:14:54,133 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 25 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 02:15:11,021 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 16 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-20 02:15:44,442 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 3050, loss[loss=0.1157, beats_loss=0.009612, ecapa_loss=0.0001476, whisper_loss=0.1046, over 18243.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01045, ecapa_loss=0.0001396, whisper_loss=0.08959, over 3847240.44 frames. ], batch size: 72, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:15:49,856 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 20 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-20 02:15:50,013 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 02:15:54,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4624300.0, ans=0.125 2024-08-20 02:15:56,064 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.348e+01 2.639e+01 2.982e+01 8.249e+01, threshold=5.278e+01, percent-clipped=1.0 2024-08-20 02:16:01,598 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 02:16:08,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4624400.0, ans=0.1 2024-08-20 02:16:10,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4624400.0, ans=0.07 2024-08-20 02:16:28,480 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 02:16:29,639 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 02:16:38,660 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-20 02:17:09,617 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 3100, loss[loss=0.1315, beats_loss=0.008226, ecapa_loss=0.0001516, whisper_loss=0.1218, over 20545.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01041, ecapa_loss=0.0001408, whisper_loss=0.08997, over 3836940.95 frames. ], batch size: 81, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:17:23,128 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 29 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-20 02:17:44,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4625000.0, ans=0.125 2024-08-20 02:18:12,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4625100.0, ans=0.0 2024-08-20 02:18:29,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4625200.0, ans=0.1 2024-08-20 02:18:33,640 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 3150, loss[loss=0.07685, beats_loss=0.01306, ecapa_loss=0.0001423, whisper_loss=0.06236, over 17145.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01046, ecapa_loss=0.0001411, whisper_loss=0.08964, over 3824228.01 frames. ], batch size: 71, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:18:44,615 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.262e+01 2.448e+01 2.716e+01 4.425e+01, threshold=4.896e+01, percent-clipped=0.0 2024-08-20 02:18:52,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4625400.0, ans=0.0 2024-08-20 02:18:56,674 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 29 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-20 02:19:01,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4625400.0, ans=0.125 2024-08-20 02:19:20,070 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 22 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 02:19:31,880 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 24 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-20 02:19:33,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4625600.0, ans=0.125 2024-08-20 02:19:42,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=4625700.0, ans=22.5 2024-08-20 02:19:54,046 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 28 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-20 02:19:56,719 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 3200, loss[loss=0.1415, beats_loss=0.00625, ecapa_loss=0.0001618, whisper_loss=0.1337, over 17559.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01041, ecapa_loss=0.0001408, whisper_loss=0.09031, over 3799695.81 frames. ], batch size: 67, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:20:08,854 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-20 02:20:10,316 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 36 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-20 02:20:10,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4625800.0, ans=0.0 2024-08-20 02:20:24,455 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.43 vs. limit=10.0 2024-08-20 02:20:24,473 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.59 vs. limit=15.0 2024-08-20 02:20:49,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4626100.0, ans=0.125 2024-08-20 02:20:51,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4626100.0, ans=0.2 2024-08-20 02:20:59,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4626100.0, ans=0.125 2024-08-20 02:21:04,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4626200.0, ans=0.2 2024-08-20 02:21:16,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=4626200.0, ans=10.0 2024-08-20 02:21:17,141 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 33 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-20 02:21:20,038 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 3250, loss[loss=0.1219, beats_loss=0.009809, ecapa_loss=0.0001487, whisper_loss=0.1106, over 23084.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01033, ecapa_loss=0.0001418, whisper_loss=0.09119, over 3783254.62 frames. ], batch size: 89, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:21:27,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4626300.0, ans=0.125 2024-08-20 02:21:32,379 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.285e+01 2.517e+01 2.834e+01 4.980e+01, threshold=5.034e+01, percent-clipped=1.0 2024-08-20 02:22:02,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4626500.0, ans=0.2 2024-08-20 02:22:15,839 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 23 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-20 02:22:16,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4626600.0, ans=0.5 2024-08-20 02:22:26,353 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=15.0 2024-08-20 02:22:40,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4626700.0, ans=0.1 2024-08-20 02:22:42,541 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2024-08-20 02:22:46,848 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 3300, loss[loss=0.1015, beats_loss=0.01262, ecapa_loss=0.0001286, whisper_loss=0.08756, over 22515.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01031, ecapa_loss=0.0001414, whisper_loss=0.09173, over 3815329.91 frames. ], batch size: 90, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:22:53,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4626800.0, ans=0.125 2024-08-20 02:23:05,424 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 21 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-20 02:23:15,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4626900.0, ans=0.125 2024-08-20 02:23:20,347 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 24 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 02:23:33,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=4627000.0, ans=15.0 2024-08-20 02:23:34,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4627100.0, ans=0.125 2024-08-20 02:23:36,414 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 35 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 02:23:39,967 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.34 vs. limit=22.5 2024-08-20 02:24:08,863 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 3350, loss[loss=0.1034, beats_loss=0.0102, ecapa_loss=0.0001582, whisper_loss=0.09163, over 21524.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01034, ecapa_loss=0.0001416, whisper_loss=0.09122, over 3815336.16 frames. ], batch size: 90, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:24:20,708 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.203e+01 2.406e+01 2.784e+01 4.307e+01, threshold=4.813e+01, percent-clipped=0.0 2024-08-20 02:24:42,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4627500.0, ans=0.1 2024-08-20 02:25:31,856 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 23 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-20 02:25:32,819 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 3400, loss[loss=0.09929, beats_loss=0.01136, ecapa_loss=0.0001376, whisper_loss=0.08656, over 22002.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01034, ecapa_loss=0.0001417, whisper_loss=0.09114, over 3836185.93 frames. ], batch size: 89, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:25:40,485 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 25 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-20 02:25:52,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4627900.0, ans=0.0 2024-08-20 02:25:54,734 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 23 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 02:25:54,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4627900.0, ans=10.0 2024-08-20 02:26:33,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=4628100.0, ans=0.95 2024-08-20 02:26:35,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4628100.0, ans=0.125 2024-08-20 02:26:55,322 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 3450, loss[loss=0.1094, beats_loss=0.00909, ecapa_loss=0.000113, whisper_loss=0.09919, over 15832.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01028, ecapa_loss=0.0001412, whisper_loss=0.09161, over 3856557.31 frames. ], batch size: 58, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:27:07,172 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.276e+01 2.600e+01 2.959e+01 4.699e+01, threshold=5.199e+01, percent-clipped=0.0 2024-08-20 02:27:09,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4628300.0, ans=0.2 2024-08-20 02:27:14,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4628400.0, ans=0.125 2024-08-20 02:27:28,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4628500.0, ans=0.125 2024-08-20 02:27:30,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4628500.0, ans=0.0 2024-08-20 02:27:41,976 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 20 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-20 02:28:11,650 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 27 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 02:28:19,445 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 3500, loss[loss=0.1056, beats_loss=0.009257, ecapa_loss=0.0001108, whisper_loss=0.09526, over 19174.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01032, ecapa_loss=0.0001419, whisper_loss=0.09083, over 3861971.36 frames. ], batch size: 71, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:28:20,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4628800.0, ans=0.0 2024-08-20 02:28:21,437 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 19 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-20 02:28:42,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4628900.0, ans=0.0 2024-08-20 02:28:58,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4629000.0, ans=0.125 2024-08-20 02:29:06,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4629000.0, ans=0.0 2024-08-20 02:29:09,598 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 25 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-20 02:29:15,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4629100.0, ans=0.0 2024-08-20 02:29:30,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4629200.0, ans=0.1 2024-08-20 02:29:44,371 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 3550, loss[loss=0.1004, beats_loss=0.01134, ecapa_loss=0.0001033, whisper_loss=0.088, over 16594.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01036, ecapa_loss=0.0001412, whisper_loss=0.09078, over 3860697.88 frames. ], batch size: 62, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:29:51,498 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 02:29:52,955 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 20 from LS+wenet, 15 from Vox, 55 fro AS 2024-08-20 02:29:56,322 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.386e+01 2.605e+01 2.983e+01 3.766e+02, threshold=5.211e+01, percent-clipped=1.0 2024-08-20 02:30:29,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4629500.0, ans=0.0 2024-08-20 02:30:55,495 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.75 vs. limit=15.0 2024-08-20 02:31:24,047 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 3600, loss[loss=0.07716, beats_loss=0.01219, ecapa_loss=0.0001382, whisper_loss=0.06359, over 21209.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.0001397, whisper_loss=0.0904, over 3883978.54 frames. ], batch size: 88, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:31:26,906 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 02:31:49,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4629900.0, ans=0.05 2024-08-20 02:32:20,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.28 vs. limit=15.0 2024-08-20 02:32:21,436 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 24 from LS+wenet, 17 from Vox, 50 fro AS 2024-08-20 02:32:25,116 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 36 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 02:32:25,662 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.15 vs. limit=15.0 2024-08-20 02:32:33,419 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 12 from LS+wenet, 36 from Vox, 26 fro AS 2024-08-20 02:32:34,175 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.16 vs. limit=22.5 2024-08-20 02:32:39,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4630100.0, ans=0.0 2024-08-20 02:32:45,165 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 02:33:05,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4630200.0, ans=0.0 2024-08-20 02:33:15,259 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 3650, loss[loss=0.1097, beats_loss=0.008391, ecapa_loss=0.0001086, whisper_loss=0.1002, over 14847.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01045, ecapa_loss=0.000138, whisper_loss=0.08951, over 3864243.22 frames. ], batch size: 54, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:33:17,710 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-20 02:33:29,587 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.219e+01 2.439e+01 2.661e+01 4.108e+01, threshold=4.879e+01, percent-clipped=0.0 2024-08-20 02:33:34,104 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 02:34:00,920 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2024-08-20 02:34:12,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4630500.0, ans=0.035 2024-08-20 02:34:28,963 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.334e-02 2024-08-20 02:34:42,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4630700.0, ans=0.125 2024-08-20 02:34:44,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4630700.0, ans=0.125 2024-08-20 02:35:03,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4630700.0, ans=0.125 2024-08-20 02:35:06,823 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 3700, loss[loss=0.116, beats_loss=0.009914, ecapa_loss=0.0001548, whisper_loss=0.1045, over 14183.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01043, ecapa_loss=0.0001389, whisper_loss=0.08963, over 3849575.66 frames. ], batch size: 58, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:35:10,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4630800.0, ans=0.125 2024-08-20 02:35:16,040 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 20 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-20 02:35:20,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4630800.0, ans=0.015 2024-08-20 02:35:25,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4630800.0, ans=0.05 2024-08-20 02:35:39,479 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.34 vs. limit=15.0 2024-08-20 02:35:58,491 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 17 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-20 02:36:13,289 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 34 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 02:36:34,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4631200.0, ans=0.0 2024-08-20 02:36:37,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4631200.0, ans=0.0 2024-08-20 02:36:53,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4631200.0, ans=0.125 2024-08-20 02:36:58,529 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 3750, loss[loss=0.09098, beats_loss=0.01089, ecapa_loss=0.0001398, whisper_loss=0.07869, over 22280.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01035, ecapa_loss=0.0001401, whisper_loss=0.08988, over 3796719.88 frames. ], batch size: 95, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:37:05,816 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 23 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-20 02:37:11,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4631300.0, ans=0.125 2024-08-20 02:37:13,338 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.670e+01 2.276e+01 2.480e+01 2.901e+01 4.929e+01, threshold=4.959e+01, percent-clipped=1.0 2024-08-20 02:37:17,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4631400.0, ans=0.1 2024-08-20 02:37:30,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4631400.0, ans=0.2 2024-08-20 02:37:45,978 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 25 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 02:37:54,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4631500.0, ans=0.125 2024-08-20 02:38:09,400 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.50 vs. limit=15.0 2024-08-20 02:38:15,754 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 02:38:31,924 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.70 vs. limit=15.0 2024-08-20 02:38:41,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4631700.0, ans=0.2 2024-08-20 02:38:47,582 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 3800, loss[loss=0.09941, beats_loss=0.01221, ecapa_loss=0.0001237, whisper_loss=0.08596, over 19712.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01043, ecapa_loss=0.0001404, whisper_loss=0.08894, over 3801084.86 frames. ], batch size: 79, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:38:52,760 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 15 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 02:38:53,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4631800.0, ans=0.125 2024-08-20 02:38:54,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4631800.0, ans=0.125 2024-08-20 02:38:58,364 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.34 vs. limit=15.0 2024-08-20 02:39:17,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4631900.0, ans=0.035 2024-08-20 02:39:44,421 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 24 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-20 02:39:44,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4632000.0, ans=0.1 2024-08-20 02:40:40,662 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 3850, loss[loss=0.09289, beats_loss=0.01179, ecapa_loss=0.0001488, whisper_loss=0.07961, over 22004.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0104, ecapa_loss=0.0001419, whisper_loss=0.08911, over 3801122.75 frames. ], batch size: 92, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:40:45,870 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 02:40:47,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4632300.0, ans=0.1 2024-08-20 02:40:49,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4632300.0, ans=0.07 2024-08-20 02:40:55,854 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.447e+01 2.712e+01 3.131e+01 3.132e+02, threshold=5.425e+01, percent-clipped=6.0 2024-08-20 02:40:57,840 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 31 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-20 02:41:00,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4632400.0, ans=0.035 2024-08-20 02:41:02,441 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-20 02:41:09,321 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 02:41:09,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4632400.0, ans=0.125 2024-08-20 02:41:11,255 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 25 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-20 02:41:46,728 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 24 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-20 02:41:57,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4632600.0, ans=0.0 2024-08-20 02:42:26,920 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 3900, loss[loss=0.1016, beats_loss=0.009372, ecapa_loss=0.0001383, whisper_loss=0.09086, over 14316.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01052, ecapa_loss=0.0001423, whisper_loss=0.08838, over 3789614.02 frames. ], batch size: 58, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:42:29,468 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 22 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 02:42:35,062 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 17 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-20 02:42:37,189 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 24 from LS+wenet, 10 from Vox, 51 fro AS 2024-08-20 02:42:39,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4632800.0, ans=0.0 2024-08-20 02:42:41,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4632800.0, ans=0.1 2024-08-20 02:42:45,471 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 24 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 02:42:45,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4632900.0, ans=0.125 2024-08-20 02:42:54,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4632900.0, ans=0.125 2024-08-20 02:43:37,755 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 26 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 02:44:17,841 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 3950, loss[loss=0.09219, beats_loss=0.01013, ecapa_loss=0.0001647, whisper_loss=0.08042, over 20568.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01051, ecapa_loss=0.0001412, whisper_loss=0.08896, over 3805472.98 frames. ], batch size: 89, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:44:18,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4633300.0, ans=0.125 2024-08-20 02:44:27,275 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.581e-02 2024-08-20 02:44:31,090 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 35 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 02:44:33,328 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.316e+01 2.520e+01 2.771e+01 2.265e+02, threshold=5.040e+01, percent-clipped=2.0 2024-08-20 02:45:00,670 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 27 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 02:45:20,989 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 22 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 02:46:05,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4633700.0, ans=0.0 2024-08-20 02:46:09,264 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 4000, loss[loss=0.0986, beats_loss=0.01105, ecapa_loss=0.0001565, whisper_loss=0.08598, over 21846.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01048, ecapa_loss=0.0001416, whisper_loss=0.08969, over 3852630.83 frames. ], batch size: 89, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:46:15,961 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 12 from LS+wenet, 11 from Vox, 41 fro AS 2024-08-20 02:46:34,595 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.53 vs. limit=12.0 2024-08-20 02:46:38,585 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-20 02:46:54,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4634000.0, ans=0.0 2024-08-20 02:47:11,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4634000.0, ans=0.035 2024-08-20 02:47:16,208 INFO [train_multi_KD3.py:845] (0/4) A total of 49 cuts. 12 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-20 02:47:27,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4634100.0, ans=0.125 2024-08-20 02:47:28,673 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.23 vs. limit=12.0 2024-08-20 02:47:33,106 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 28 from LS+wenet, 11 from Vox, 36 fro AS 2024-08-20 02:47:46,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4634200.0, ans=0.1 2024-08-20 02:48:05,837 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 4050, loss[loss=0.1077, beats_loss=0.00904, ecapa_loss=0.0001887, whisper_loss=0.09675, over 13665.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01042, ecapa_loss=0.0001411, whisper_loss=0.08995, over 3820039.97 frames. ], batch size: 55, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:48:06,105 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 02:48:13,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4634300.0, ans=0.125 2024-08-20 02:48:22,559 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.656e+01 2.303e+01 2.496e+01 2.881e+01 4.421e+01, threshold=4.993e+01, percent-clipped=0.0 2024-08-20 02:48:30,537 INFO [train_multi_KD3.py:845] (0/4) A total of 49 cuts. 19 from LS+wenet, 16 from Vox, 14 fro AS 2024-08-20 02:48:42,420 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=8.789e-03 2024-08-20 02:48:47,271 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 25 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-20 02:49:00,099 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 25 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-20 02:49:07,597 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 33 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-20 02:49:09,963 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 31 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-20 02:49:29,865 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.58 vs. limit=15.0 2024-08-20 02:49:35,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4634600.0, ans=0.0 2024-08-20 02:50:06,617 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 4100, loss[loss=0.09852, beats_loss=0.01219, ecapa_loss=0.0001104, whisper_loss=0.08523, over 18547.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01037, ecapa_loss=0.0001408, whisper_loss=0.09075, over 3820221.22 frames. ], batch size: 75, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:50:16,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4634800.0, ans=0.1 2024-08-20 02:50:30,349 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 22 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-20 02:50:51,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4635000.0, ans=0.125 2024-08-20 02:50:59,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4635000.0, ans=0.125 2024-08-20 02:51:07,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4635000.0, ans=0.0 2024-08-20 02:51:12,232 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 34 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-20 02:51:27,636 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 22 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 02:51:27,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4635100.0, ans=0.1 2024-08-20 02:51:30,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4635100.0, ans=0.0 2024-08-20 02:51:32,511 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=23.83 vs. limit=22.5 2024-08-20 02:51:41,185 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 30 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 02:51:53,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4635200.0, ans=0.125 2024-08-20 02:52:01,059 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 4150, loss[loss=0.1172, beats_loss=0.008219, ecapa_loss=0.0001162, whisper_loss=0.1079, over 21854.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01037, ecapa_loss=0.0001413, whisper_loss=0.0912, over 3803814.36 frames. ], batch size: 79, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:52:06,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4635300.0, ans=0.125 2024-08-20 02:52:08,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=4635300.0, ans=15.0 2024-08-20 02:52:16,100 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.376e+01 2.677e+01 2.991e+01 4.680e+01, threshold=5.353e+01, percent-clipped=0.0 2024-08-20 02:52:16,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4635300.0, ans=0.125 2024-08-20 02:52:29,849 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 02:52:38,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4635400.0, ans=0.0 2024-08-20 02:52:40,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4635400.0, ans=0.07 2024-08-20 02:52:45,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4635500.0, ans=0.125 2024-08-20 02:52:56,654 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2024-08-20 02:53:15,755 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.03 vs. limit=15.0 2024-08-20 02:53:22,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4635600.0, ans=0.125 2024-08-20 02:53:52,210 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 4200, loss[loss=0.1191, beats_loss=0.01035, ecapa_loss=0.0001404, whisper_loss=0.1074, over 21701.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01036, ecapa_loss=0.0001417, whisper_loss=0.09144, over 3811677.44 frames. ], batch size: 88, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:55:16,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4636100.0, ans=0.125 2024-08-20 02:55:25,171 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 25 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 02:55:42,003 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 02:55:48,360 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 4250, loss[loss=0.09999, beats_loss=0.009194, ecapa_loss=0.0001373, whisper_loss=0.08942, over 22764.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01042, ecapa_loss=0.0001407, whisper_loss=0.09123, over 3833217.98 frames. ], batch size: 92, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:55:55,757 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 27 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-20 02:56:06,340 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.193e+01 2.444e+01 2.797e+01 4.359e+01, threshold=4.889e+01, percent-clipped=0.0 2024-08-20 02:56:13,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4636400.0, ans=0.1 2024-08-20 02:56:41,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4636500.0, ans=0.125 2024-08-20 02:56:55,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4636500.0, ans=0.125 2024-08-20 02:57:11,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4636600.0, ans=0.0 2024-08-20 02:57:34,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4636700.0, ans=0.0 2024-08-20 02:57:36,705 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 25 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 02:57:48,313 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 4300, loss[loss=0.1152, beats_loss=0.008264, ecapa_loss=0.0001305, whisper_loss=0.1056, over 17578.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.0001408, whisper_loss=0.09064, over 3843826.70 frames. ], batch size: 68, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 02:57:54,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4636800.0, ans=0.125 2024-08-20 02:58:04,859 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.54 vs. limit=15.0 2024-08-20 02:58:19,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4636900.0, ans=0.035 2024-08-20 02:58:22,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4636900.0, ans=0.0 2024-08-20 02:58:22,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4636900.0, ans=0.1 2024-08-20 02:58:34,193 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 23 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 02:58:39,808 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 02:59:00,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4637100.0, ans=0.0 2024-08-20 02:59:13,153 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.94 vs. limit=15.0 2024-08-20 02:59:22,749 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 28 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 02:59:47,595 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 23 from LS+wenet, 12 from Vox, 19 fro AS 2024-08-20 02:59:52,051 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 4350, loss[loss=0.1014, beats_loss=0.009959, ecapa_loss=0.0001273, whisper_loss=0.09014, over 14652.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01041, ecapa_loss=0.0001408, whisper_loss=0.09064, over 3825114.44 frames. ], batch size: 58, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:00:08,992 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.299e+01 2.481e+01 2.858e+01 4.859e+01, threshold=4.961e+01, percent-clipped=0.0 2024-08-20 03:00:14,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4637400.0, ans=0.1 2024-08-20 03:00:27,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4637400.0, ans=0.125 2024-08-20 03:00:28,686 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.70 vs. limit=8.0 2024-08-20 03:00:43,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4637500.0, ans=0.125 2024-08-20 03:00:50,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4637500.0, ans=0.125 2024-08-20 03:01:01,098 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.55 vs. limit=15.0 2024-08-20 03:01:37,354 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 31 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-20 03:01:53,488 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 4400, loss[loss=0.07412, beats_loss=0.01353, ecapa_loss=0.0001309, whisper_loss=0.05928, over 22893.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0105, ecapa_loss=0.0001399, whisper_loss=0.08909, over 3814700.83 frames. ], batch size: 95, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:01:56,753 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.547e-01 2024-08-20 03:02:10,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4637800.0, ans=0.5 2024-08-20 03:02:31,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4637900.0, ans=0.1 2024-08-20 03:02:41,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4638000.0, ans=0.025 2024-08-20 03:03:10,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4638100.0, ans=0.0 2024-08-20 03:03:10,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4638100.0, ans=0.125 2024-08-20 03:03:17,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4638100.0, ans=0.025 2024-08-20 03:03:18,011 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.22 vs. limit=15.0 2024-08-20 03:03:28,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4638200.0, ans=0.035 2024-08-20 03:03:33,376 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 03:03:49,841 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=22.5 2024-08-20 03:03:56,298 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 4450, loss[loss=0.1006, beats_loss=0.01331, ecapa_loss=0.0001573, whisper_loss=0.08568, over 18359.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01046, ecapa_loss=0.0001395, whisper_loss=0.0892, over 3791122.91 frames. ], batch size: 75, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:03:56,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4638300.0, ans=0.1 2024-08-20 03:04:12,967 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.659e+01 2.158e+01 2.452e+01 2.719e+01 3.768e+01, threshold=4.904e+01, percent-clipped=0.0 2024-08-20 03:04:40,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4638400.0, ans=0.1 2024-08-20 03:05:04,672 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 03:05:23,379 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 20 from LS+wenet, 15 from Vox, 16 fro AS 2024-08-20 03:05:57,818 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 19 from LS+wenet, 20 from Vox, 13 fro AS 2024-08-20 03:06:00,011 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 4500, loss[loss=0.1135, beats_loss=0.006122, ecapa_loss=0.0001777, whisper_loss=0.1056, over 12943.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01045, ecapa_loss=0.0001393, whisper_loss=0.08872, over 3794388.11 frames. ], batch size: 52, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:06:05,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4638800.0, ans=0.125 2024-08-20 03:06:07,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4638800.0, ans=0.1 2024-08-20 03:06:09,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4638800.0, ans=0.0 2024-08-20 03:06:47,778 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 14 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 03:07:19,238 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.22 vs. limit=10.0 2024-08-20 03:07:27,370 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.54 vs. limit=15.0 2024-08-20 03:07:29,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4639100.0, ans=0.2 2024-08-20 03:08:05,424 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 4550, loss[loss=0.1277, beats_loss=0.008439, ecapa_loss=0.0001325, whisper_loss=0.118, over 22592.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01044, ecapa_loss=0.0001398, whisper_loss=0.0888, over 3793859.14 frames. ], batch size: 83, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:08:13,700 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.31 vs. limit=22.5 2024-08-20 03:08:23,753 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.329e+01 2.605e+01 2.856e+01 5.309e+01, threshold=5.211e+01, percent-clipped=1.0 2024-08-20 03:08:31,078 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-20 03:08:45,156 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 19 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-20 03:09:07,477 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 20 from LS+wenet, 37 from Vox, 31 fro AS 2024-08-20 03:09:17,674 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 14 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-20 03:09:23,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4639600.0, ans=0.95 2024-08-20 03:09:30,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4639600.0, ans=0.125 2024-08-20 03:09:43,463 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 14 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 03:09:56,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4639700.0, ans=0.125 2024-08-20 03:10:13,198 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 4600, loss[loss=0.102, beats_loss=0.0109, ecapa_loss=0.0001398, whisper_loss=0.08973, over 16076.00 frames. ], tot_loss[loss=0.09976, beats_loss=0.01047, ecapa_loss=0.00014, whisper_loss=0.08789, over 3749641.01 frames. ], batch size: 64, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:10:13,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=4639800.0, ans=0.1 2024-08-20 03:10:18,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4639800.0, ans=0.125 2024-08-20 03:10:43,715 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 13 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-20 03:10:46,808 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-08-20 03:10:57,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4639900.0, ans=0.0 2024-08-20 03:11:01,897 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-464000.pt 2024-08-20 03:11:13,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4640000.0, ans=0.125 2024-08-20 03:11:19,575 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.52 vs. limit=22.5 2024-08-20 03:11:39,060 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-20 03:11:39,554 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=15.0 2024-08-20 03:11:56,534 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2024-08-20 03:12:06,193 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 03:12:09,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=4640200.0, ans=22.5 2024-08-20 03:12:24,831 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 4650, loss[loss=0.1184, beats_loss=0.009554, ecapa_loss=0.0001398, whisper_loss=0.1075, over 22388.00 frames. ], tot_loss[loss=0.0997, beats_loss=0.01049, ecapa_loss=0.0001391, whisper_loss=0.08782, over 3778994.49 frames. ], batch size: 91, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:12:39,270 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 18 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-20 03:12:41,322 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.331e+01 2.446e+01 2.750e+01 3.848e+01, threshold=4.892e+01, percent-clipped=0.0 2024-08-20 03:12:59,534 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 16 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 03:13:11,005 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 29 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 03:13:13,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4640500.0, ans=0.125 2024-08-20 03:13:20,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4640500.0, ans=0.0 2024-08-20 03:13:28,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4640500.0, ans=0.1 2024-08-20 03:13:43,891 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 22 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-20 03:13:52,041 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 30 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-20 03:13:52,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4640600.0, ans=0.125 2024-08-20 03:13:59,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4640600.0, ans=0.125 2024-08-20 03:14:02,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4640700.0, ans=0.2 2024-08-20 03:14:07,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4640700.0, ans=0.125 2024-08-20 03:14:18,375 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.36 vs. limit=22.5 2024-08-20 03:14:30,530 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 4700, loss[loss=0.1087, beats_loss=0.008145, ecapa_loss=0.0001902, whisper_loss=0.09863, over 20031.00 frames. ], tot_loss[loss=0.09991, beats_loss=0.01047, ecapa_loss=0.0001406, whisper_loss=0.08804, over 3778602.10 frames. ], batch size: 85, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:14:36,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4640800.0, ans=0.125 2024-08-20 03:14:36,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4640800.0, ans=0.0 2024-08-20 03:14:43,269 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 18 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 03:14:56,905 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=15.0 2024-08-20 03:15:24,474 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 26 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-20 03:15:59,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4641100.0, ans=0.2 2024-08-20 03:16:19,677 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 22 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-20 03:16:30,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4641200.0, ans=0.2 2024-08-20 03:16:34,886 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 4750, loss[loss=0.0884, beats_loss=0.01124, ecapa_loss=0.0001072, whisper_loss=0.07608, over 14643.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01042, ecapa_loss=0.0001421, whisper_loss=0.08841, over 3758936.03 frames. ], batch size: 55, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:16:53,298 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.351e+01 2.626e+01 2.955e+01 4.641e+01, threshold=5.251e+01, percent-clipped=0.0 2024-08-20 03:16:57,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4641400.0, ans=0.125 2024-08-20 03:17:15,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4641400.0, ans=0.0 2024-08-20 03:17:15,373 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.92 vs. limit=6.0 2024-08-20 03:17:37,540 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 30 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-20 03:18:04,558 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 19 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-20 03:18:07,644 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 20 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-20 03:18:31,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4641700.0, ans=0.125 2024-08-20 03:18:36,780 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.14 vs. limit=22.5 2024-08-20 03:18:40,927 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 4800, loss[loss=0.09425, beats_loss=0.01084, ecapa_loss=0.000125, whisper_loss=0.08216, over 17920.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01034, ecapa_loss=0.0001425, whisper_loss=0.0892, over 3770465.37 frames. ], batch size: 71, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:18:51,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4641800.0, ans=0.125 2024-08-20 03:19:01,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4641800.0, ans=0.2 2024-08-20 03:19:01,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4641800.0, ans=0.125 2024-08-20 03:19:01,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4641800.0, ans=0.025 2024-08-20 03:19:13,189 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 22 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-20 03:19:40,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4642000.0, ans=0.0 2024-08-20 03:20:04,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4642100.0, ans=0.125 2024-08-20 03:20:46,507 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 4850, loss[loss=0.07486, beats_loss=0.01321, ecapa_loss=0.0001511, whisper_loss=0.06014, over 21556.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0103, ecapa_loss=0.0001433, whisper_loss=0.0898, over 3798193.04 frames. ], batch size: 93, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:20:46,703 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 03:21:02,374 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.316e+01 2.589e+01 3.055e+01 7.163e+01, threshold=5.178e+01, percent-clipped=1.0 2024-08-20 03:21:13,477 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 23 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-20 03:21:58,501 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.82 vs. limit=15.0 2024-08-20 03:22:04,774 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.68 vs. limit=15.0 2024-08-20 03:22:24,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4642700.0, ans=0.1 2024-08-20 03:22:35,268 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 4900, loss[loss=0.08503, beats_loss=0.01145, ecapa_loss=0.0001659, whisper_loss=0.07191, over 22052.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01038, ecapa_loss=0.0001438, whisper_loss=0.08935, over 3793808.20 frames. ], batch size: 94, lr: 1.92e-03, grad_scale: 5.764607523034235e+17 2024-08-20 03:23:16,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4643000.0, ans=0.1 2024-08-20 03:23:17,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4643000.0, ans=0.125 2024-08-20 03:23:28,088 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 03:23:41,814 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 40 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-20 03:23:48,312 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 03:24:20,429 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 4950, loss[loss=0.0972, beats_loss=0.009173, ecapa_loss=0.000153, whisper_loss=0.0865, over 21308.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01046, ecapa_loss=0.0001426, whisper_loss=0.08885, over 3820941.58 frames. ], batch size: 87, lr: 1.92e-03, grad_scale: 1.152921504606847e+18 2024-08-20 03:24:34,462 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.306e+01 2.561e+01 2.855e+01 3.879e+01, threshold=5.122e+01, percent-clipped=0.0 2024-08-20 03:25:11,329 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 03:25:11,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.78 vs. limit=5.0 2024-08-20 03:25:24,923 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 19 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-20 03:25:50,447 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 20 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-20 03:25:55,356 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 5000, loss[loss=0.0914, beats_loss=0.01068, ecapa_loss=0.0001841, whisper_loss=0.07888, over 19743.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01039, ecapa_loss=0.0001432, whisper_loss=0.08914, over 3795026.66 frames. ], batch size: 84, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:25:55,592 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 31 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 03:26:07,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4643800.0, ans=0.2 2024-08-20 03:26:11,315 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.85 vs. limit=15.0 2024-08-20 03:26:14,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4643900.0, ans=0.1 2024-08-20 03:26:27,357 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 15 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 03:26:32,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4644000.0, ans=0.0 2024-08-20 03:26:32,944 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.49 vs. limit=15.0 2024-08-20 03:26:36,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4644000.0, ans=0.025 2024-08-20 03:26:39,880 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 03:26:41,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4644000.0, ans=0.0 2024-08-20 03:26:53,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4644100.0, ans=0.125 2024-08-20 03:27:00,554 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 23 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-20 03:27:25,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4644300.0, ans=0.125 2024-08-20 03:27:27,630 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 5050, loss[loss=0.1013, beats_loss=0.01024, ecapa_loss=0.0001592, whisper_loss=0.08946, over 15063.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01037, ecapa_loss=0.0001432, whisper_loss=0.08878, over 3788704.00 frames. ], batch size: 61, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:27:31,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4644300.0, ans=0.0 2024-08-20 03:27:33,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4644300.0, ans=10.0 2024-08-20 03:27:36,940 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 29 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-20 03:27:44,294 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.280e+01 2.515e+01 2.844e+01 3.725e+01, threshold=5.031e+01, percent-clipped=0.0 2024-08-20 03:28:05,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4644500.0, ans=0.125 2024-08-20 03:28:12,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4644500.0, ans=0.125 2024-08-20 03:28:14,293 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 29 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 03:28:14,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4644500.0, ans=0.125 2024-08-20 03:28:18,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4644500.0, ans=0.125 2024-08-20 03:28:27,825 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.39 vs. limit=15.0 2024-08-20 03:28:50,752 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-20 03:28:55,852 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 14 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-20 03:28:57,057 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 5100, loss[loss=0.08919, beats_loss=0.009778, ecapa_loss=0.0001601, whisper_loss=0.07781, over 13009.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01039, ecapa_loss=0.0001427, whisper_loss=0.0889, over 3800761.78 frames. ], batch size: 50, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:28:57,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4644800.0, ans=0.125 2024-08-20 03:28:57,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=4644800.0, ans=22.5 2024-08-20 03:29:01,198 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 21 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 03:29:01,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4644800.0, ans=0.2 2024-08-20 03:29:07,532 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.30 vs. limit=15.0 2024-08-20 03:29:19,202 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.40 vs. limit=10.0 2024-08-20 03:29:31,534 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-20 03:29:31,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4645000.0, ans=0.125 2024-08-20 03:29:36,645 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 31 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-20 03:29:38,902 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 22 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-20 03:29:42,865 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 27 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-20 03:29:49,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4645000.0, ans=0.2 2024-08-20 03:29:52,033 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 15 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-20 03:30:04,609 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 03:30:10,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4645200.0, ans=0.125 2024-08-20 03:30:24,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4645200.0, ans=0.125 2024-08-20 03:30:27,059 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 5150, loss[loss=0.1207, beats_loss=0.007868, ecapa_loss=0.000186, whisper_loss=0.111, over 22740.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01044, ecapa_loss=0.0001428, whisper_loss=0.08914, over 3831105.97 frames. ], batch size: 91, lr: 1.92e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:30:33,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4645300.0, ans=0.09899494936611666 2024-08-20 03:30:38,508 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 30 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-20 03:30:42,418 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.233e+01 2.389e+01 2.694e+01 3.675e+01, threshold=4.778e+01, percent-clipped=0.0 2024-08-20 03:30:47,492 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.99 vs. limit=15.0 2024-08-20 03:31:09,255 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 2024-08-20 03:31:18,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4645600.0, ans=0.125 2024-08-20 03:31:30,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4645600.0, ans=0.125 2024-08-20 03:31:37,625 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.90 vs. limit=15.0 2024-08-20 03:31:45,837 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-20 03:31:53,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4645800.0, ans=0.125 2024-08-20 03:31:54,533 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 5200, loss[loss=0.08484, beats_loss=0.009461, ecapa_loss=0.0001129, whisper_loss=0.07425, over 13857.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01041, ecapa_loss=0.0001423, whisper_loss=0.08917, over 3806065.59 frames. ], batch size: 52, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:31:58,920 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=15.0 2024-08-20 03:32:25,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4645900.0, ans=0.04949747468305833 2024-08-20 03:32:27,280 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.77 vs. limit=15.0 2024-08-20 03:32:37,991 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 24 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-20 03:32:39,701 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 32 from LS+wenet, 31 from Vox, 25 fro AS 2024-08-20 03:32:40,290 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.02 vs. limit=12.0 2024-08-20 03:32:43,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4646000.0, ans=0.0 2024-08-20 03:32:55,970 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 24 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-20 03:33:15,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4646200.0, ans=0.125 2024-08-20 03:33:15,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4646200.0, ans=0.0 2024-08-20 03:33:17,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4646200.0, ans=0.125 2024-08-20 03:33:24,350 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 5250, loss[loss=0.1053, beats_loss=0.01175, ecapa_loss=0.0001225, whisper_loss=0.09237, over 22527.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01042, ecapa_loss=0.0001421, whisper_loss=0.08966, over 3811200.06 frames. ], batch size: 89, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:33:35,394 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 11 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 03:33:39,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4646300.0, ans=0.1 2024-08-20 03:33:40,163 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.321e+01 2.600e+01 2.824e+01 7.148e+01, threshold=5.200e+01, percent-clipped=2.0 2024-08-20 03:33:47,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4646400.0, ans=0.2 2024-08-20 03:34:19,943 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 21 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-20 03:34:20,898 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.11 vs. limit=15.0 2024-08-20 03:34:25,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4646600.0, ans=0.0 2024-08-20 03:34:34,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4646700.0, ans=0.125 2024-08-20 03:34:40,402 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.375e+00 2024-08-20 03:34:50,362 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.483e+05 2024-08-20 03:34:54,859 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 24 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 03:34:55,852 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 5300, loss[loss=0.1148, beats_loss=0.009436, ecapa_loss=0.0001352, whisper_loss=0.104, over 17101.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01047, ecapa_loss=0.0001398, whisper_loss=0.08995, over 3787613.17 frames. ], batch size: 65, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:35:03,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4646800.0, ans=0.1 2024-08-20 03:35:12,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4646800.0, ans=0.125 2024-08-20 03:35:13,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4646900.0, ans=0.125 2024-08-20 03:35:28,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4646900.0, ans=0.035 2024-08-20 03:35:35,273 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 19 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-20 03:35:37,663 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 14 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-20 03:35:58,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4647100.0, ans=0.125 2024-08-20 03:36:36,418 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 5350, loss[loss=0.09565, beats_loss=0.009665, ecapa_loss=0.0001423, whisper_loss=0.08456, over 15459.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01045, ecapa_loss=0.0001393, whisper_loss=0.08971, over 3730540.61 frames. ], batch size: 60, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:36:57,670 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.184e+01 2.426e+01 2.687e+01 4.168e+01, threshold=4.852e+01, percent-clipped=0.0 2024-08-20 03:37:17,147 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 28 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 03:37:20,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4647400.0, ans=0.1 2024-08-20 03:37:33,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4647500.0, ans=0.2 2024-08-20 03:37:38,359 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.06 vs. limit=12.0 2024-08-20 03:37:54,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4647600.0, ans=0.125 2024-08-20 03:38:10,719 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-20 03:38:30,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4647700.0, ans=0.125 2024-08-20 03:38:35,821 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 5400, loss[loss=0.09911, beats_loss=0.01047, ecapa_loss=0.0001445, whisper_loss=0.0872, over 20542.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01033, ecapa_loss=0.00014, whisper_loss=0.09049, over 3742619.58 frames. ], batch size: 84, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:38:40,824 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.30 vs. limit=10.0 2024-08-20 03:38:47,382 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 18 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-20 03:38:49,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4647800.0, ans=0.1 2024-08-20 03:39:01,386 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 03:39:05,698 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 25 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-20 03:39:12,558 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 31 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-20 03:39:21,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4648000.0, ans=0.125 2024-08-20 03:39:54,427 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 31 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 03:39:54,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4648100.0, ans=0.125 2024-08-20 03:40:06,096 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 29 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-20 03:40:12,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4648200.0, ans=0.0 2024-08-20 03:40:26,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4648300.0, ans=0.125 2024-08-20 03:40:28,659 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 5450, loss[loss=0.09507, beats_loss=0.01138, ecapa_loss=0.0001355, whisper_loss=0.08234, over 16160.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01041, ecapa_loss=0.0001394, whisper_loss=0.08979, over 3736891.77 frames. ], batch size: 64, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:40:45,558 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.272e+01 2.507e+01 2.790e+01 3.633e+01, threshold=5.013e+01, percent-clipped=0.0 2024-08-20 03:40:54,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4648400.0, ans=0.2 2024-08-20 03:41:14,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4648500.0, ans=0.07 2024-08-20 03:41:19,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4648500.0, ans=0.2 2024-08-20 03:41:42,279 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 27 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-20 03:42:01,179 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 12 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 03:42:06,369 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 15 from LS+wenet, 7 from Vox, 30 fro AS 2024-08-20 03:42:14,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4648700.0, ans=0.1 2024-08-20 03:42:16,811 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.44 vs. limit=6.0 2024-08-20 03:42:18,160 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 5500, loss[loss=0.1078, beats_loss=0.008966, ecapa_loss=0.0001442, whisper_loss=0.09741, over 15090.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0104, ecapa_loss=0.0001385, whisper_loss=0.09004, over 3793534.22 frames. ], batch size: 57, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:42:23,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4648800.0, ans=0.0 2024-08-20 03:42:50,457 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 24 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-20 03:43:21,370 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 03:43:42,928 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.29 vs. limit=15.0 2024-08-20 03:43:46,277 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 19 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-20 03:43:55,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4649200.0, ans=0.125 2024-08-20 03:44:11,985 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 5550, loss[loss=0.08883, beats_loss=0.01272, ecapa_loss=0.0001136, whisper_loss=0.07497, over 23067.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001385, whisper_loss=0.09029, over 3800335.12 frames. ], batch size: 93, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:44:21,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4649300.0, ans=0.125 2024-08-20 03:44:35,503 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.290e+01 2.579e+01 2.821e+01 2.823e+02, threshold=5.158e+01, percent-clipped=2.0 2024-08-20 03:45:18,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4649500.0, ans=0.1 2024-08-20 03:45:28,933 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 23 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-20 03:45:35,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4649600.0, ans=0.125 2024-08-20 03:46:11,403 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 5600, loss[loss=0.1132, beats_loss=0.01006, ecapa_loss=0.000137, whisper_loss=0.1018, over 22502.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01026, ecapa_loss=0.0001395, whisper_loss=0.09069, over 3811404.19 frames. ], batch size: 88, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:46:51,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4650000.0, ans=0.0 2024-08-20 03:46:54,160 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 03:47:01,121 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.58 vs. limit=22.5 2024-08-20 03:47:02,963 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 17 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 03:47:05,641 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.63 vs. limit=22.5 2024-08-20 03:47:14,322 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.87 vs. limit=22.5 2024-08-20 03:47:48,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4650200.0, ans=0.125 2024-08-20 03:47:59,301 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 5650, loss[loss=0.09751, beats_loss=0.009169, ecapa_loss=0.000153, whisper_loss=0.08681, over 21901.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01028, ecapa_loss=0.0001409, whisper_loss=0.09045, over 3826470.91 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:48:04,124 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.261e+05 2024-08-20 03:48:06,562 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 20 from LS+wenet, 26 from Vox, 47 fro AS 2024-08-20 03:48:20,047 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.429e+01 2.607e+01 2.937e+01 4.534e+02, threshold=5.214e+01, percent-clipped=3.0 2024-08-20 03:48:27,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4650400.0, ans=0.125 2024-08-20 03:48:51,292 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 23 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 03:49:54,588 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 5700, loss[loss=0.08977, beats_loss=0.008913, ecapa_loss=0.0001644, whisper_loss=0.07921, over 15886.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01039, ecapa_loss=0.0001406, whisper_loss=0.08977, over 3838183.25 frames. ], batch size: 66, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:50:01,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4650800.0, ans=0.07 2024-08-20 03:50:33,220 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 29 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 03:50:41,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4651000.0, ans=0.125 2024-08-20 03:50:52,973 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 21 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-20 03:51:41,478 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 5750, loss[loss=0.08084, beats_loss=0.01206, ecapa_loss=0.0001138, whisper_loss=0.06765, over 13901.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01048, ecapa_loss=0.0001404, whisper_loss=0.08913, over 3848872.28 frames. ], batch size: 55, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:51:55,024 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 24 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-20 03:52:01,517 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.309e+01 2.653e+01 2.956e+01 1.340e+02, threshold=5.306e+01, percent-clipped=1.0 2024-08-20 03:52:01,757 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 25 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 03:52:25,112 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.43 vs. limit=15.0 2024-08-20 03:53:15,070 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 28 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 03:53:24,176 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 16 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 03:53:30,706 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 5800, loss[loss=0.09394, beats_loss=0.01141, ecapa_loss=0.000117, whisper_loss=0.08136, over 21900.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01049, ecapa_loss=0.0001409, whisper_loss=0.08895, over 3867074.66 frames. ], batch size: 87, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:53:33,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4651800.0, ans=0.0 2024-08-20 03:53:44,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4651800.0, ans=0.1 2024-08-20 03:54:02,514 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.64 vs. limit=15.0 2024-08-20 03:54:08,320 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 32 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-20 03:54:34,840 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 03:54:49,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4652100.0, ans=0.125 2024-08-20 03:54:53,457 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 28 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-20 03:55:15,720 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 5850, loss[loss=0.1163, beats_loss=0.009449, ecapa_loss=0.0001294, whisper_loss=0.1056, over 22738.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01039, ecapa_loss=0.0001414, whisper_loss=0.08997, over 3842543.56 frames. ], batch size: 88, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:55:22,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4652300.0, ans=0.125 2024-08-20 03:55:30,901 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 28 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-20 03:55:33,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4652300.0, ans=0.125 2024-08-20 03:55:34,521 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.203e+01 2.512e+01 2.750e+01 3.616e+02, threshold=5.024e+01, percent-clipped=2.0 2024-08-20 03:55:36,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4652400.0, ans=0.2 2024-08-20 03:55:56,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4652500.0, ans=0.125 2024-08-20 03:55:58,269 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 25 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-20 03:55:58,511 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.521e+00 2024-08-20 03:56:07,373 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 15 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-20 03:56:12,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4652500.0, ans=0.1 2024-08-20 03:56:27,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4652600.0, ans=0.0 2024-08-20 03:56:41,093 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 22 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-20 03:56:47,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4652700.0, ans=0.0 2024-08-20 03:56:50,133 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2024-08-20 03:56:52,867 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.28 vs. limit=15.0 2024-08-20 03:57:05,954 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 5900, loss[loss=0.1071, beats_loss=0.01023, ecapa_loss=0.0001543, whisper_loss=0.09537, over 19928.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01036, ecapa_loss=0.0001415, whisper_loss=0.09036, over 3825155.60 frames. ], batch size: 78, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:57:35,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4652900.0, ans=0.1 2024-08-20 03:57:41,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4652900.0, ans=0.0 2024-08-20 03:57:48,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4653000.0, ans=0.2 2024-08-20 03:58:07,894 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 18 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-20 03:58:10,813 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.98 vs. limit=15.0 2024-08-20 03:58:17,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4653100.0, ans=0.125 2024-08-20 03:58:18,591 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 18 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 03:58:39,737 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 29 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-20 03:58:46,851 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 16 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 03:58:59,957 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 5950, loss[loss=0.1059, beats_loss=0.0116, ecapa_loss=0.0001186, whisper_loss=0.09308, over 20669.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01035, ecapa_loss=0.0001411, whisper_loss=0.09013, over 3802512.15 frames. ], batch size: 81, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 03:59:21,069 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.326e+01 2.621e+01 2.901e+01 3.816e+01, threshold=5.242e+01, percent-clipped=0.0 2024-08-20 04:00:02,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4653500.0, ans=0.125 2024-08-20 04:00:21,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4653600.0, ans=0.125 2024-08-20 04:00:23,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4653600.0, ans=0.1 2024-08-20 04:00:31,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4653700.0, ans=0.125 2024-08-20 04:00:49,264 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 6000, loss[loss=0.1227, beats_loss=0.009976, ecapa_loss=0.0001584, whisper_loss=0.1111, over 22465.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01048, ecapa_loss=0.0001403, whisper_loss=0.08916, over 3836920.40 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:00:49,265 INFO [train_multi_KD3.py:1140] (0/4) Computing validation loss 2024-08-20 04:01:25,893 INFO [train_multi_KD3.py:1150] (0/4) Epoch 32, validation on ASR_libri: loss=0.2536, beats_loss=0, ecapa_loss=0.0005122, whisper_loss=0.2485, over 931116.00 frames. 2024-08-20 04:01:50,559 INFO [train_multi_KD3.py:1150] (0/4) Epoch 32, validation on SV_voxceleb1: loss=0.003973, beats_loss=0, ecapa_loss=0.0003973, whisper_loss=0, over 944235.00 frames. 2024-08-20 04:02:27,345 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.5386, 2.9677, 2.4135, 2.1023, 2.1704, 2.0220, 2.4324, 2.4074], device='cuda:0') 2024-08-20 04:02:34,331 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.9305, 4.7066, 4.8634, 4.8956], device='cuda:0') 2024-08-20 04:02:41,430 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.6393, 2.0987, 2.3177, 1.6711, 1.8331, 2.4286, 2.8762, 1.9545], device='cuda:0') 2024-08-20 04:03:25,289 INFO [train_multi_KD3.py:1150] (0/4) Epoch 32, validation on AT_audioset: loss=0.02299, beats_loss=0.02299, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 04:03:25,293 INFO [train_multi_KD3.py:1156] (0/4) Maximum memory allocated so far is 32775MB 2024-08-20 04:03:44,660 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 18 from LS+wenet, 27 from Vox, 47 fro AS 2024-08-20 04:03:44,882 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.544e-01 2024-08-20 04:03:49,187 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.33 vs. limit=12.0 2024-08-20 04:04:17,960 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-20 04:04:46,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4654200.0, ans=0.125 2024-08-20 04:04:54,626 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 6050, loss[loss=0.1062, beats_loss=0.009651, ecapa_loss=0.0001625, whisper_loss=0.09488, over 19642.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01051, ecapa_loss=0.0001404, whisper_loss=0.08879, over 3865358.71 frames. ], batch size: 80, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:04:54,810 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 23 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-20 04:05:09,300 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.277e+01 2.536e+01 2.822e+01 4.959e+01, threshold=5.072e+01, percent-clipped=0.0 2024-08-20 04:05:14,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4654400.0, ans=0.2 2024-08-20 04:05:21,715 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 24 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 04:05:55,994 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 04:05:57,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4654600.0, ans=0.035 2024-08-20 04:06:05,170 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 27 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-20 04:06:07,445 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 27 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-20 04:06:24,005 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 6100, loss[loss=0.1083, beats_loss=0.01116, ecapa_loss=0.0001607, whisper_loss=0.09554, over 20269.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01051, ecapa_loss=0.0001397, whisper_loss=0.08944, over 3873778.21 frames. ], batch size: 81, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:06:32,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4654800.0, ans=0.0 2024-08-20 04:06:34,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4654800.0, ans=0.0 2024-08-20 04:06:38,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4654800.0, ans=0.0 2024-08-20 04:06:40,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4654800.0, ans=0.125 2024-08-20 04:06:45,296 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.83 vs. limit=15.0 2024-08-20 04:06:53,498 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 23 from LS+wenet, 17 from Vox, 14 fro AS 2024-08-20 04:07:09,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4655000.0, ans=0.0 2024-08-20 04:07:18,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.11 vs. limit=6.0 2024-08-20 04:07:28,951 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 32 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-20 04:07:36,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4655100.0, ans=0.125 2024-08-20 04:07:40,436 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 26 from LS+wenet, 13 from Vox, 16 fro AS 2024-08-20 04:07:42,827 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 18 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 04:08:11,815 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 6150, loss[loss=0.09989, beats_loss=0.008869, ecapa_loss=0.0001391, whisper_loss=0.08963, over 14340.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01072, ecapa_loss=0.0001399, whisper_loss=0.0897, over 3861330.76 frames. ], batch size: 58, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:08:21,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4655300.0, ans=0.0 2024-08-20 04:08:23,123 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 29 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 04:08:31,154 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.716e+01 2.289e+01 2.520e+01 2.857e+01 4.942e+02, threshold=5.040e+01, percent-clipped=2.0 2024-08-20 04:08:31,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4655400.0, ans=0.0 2024-08-20 04:08:53,778 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 24 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 04:08:57,578 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 18 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-20 04:09:38,903 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 12 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-20 04:09:46,022 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 25 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-20 04:10:01,753 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 6200, loss[loss=0.1023, beats_loss=0.01171, ecapa_loss=0.0001075, whisper_loss=0.08951, over 21670.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01067, ecapa_loss=0.0001393, whisper_loss=0.08973, over 3867762.18 frames. ], batch size: 84, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:10:02,697 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.72 vs. limit=12.0 2024-08-20 04:11:01,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4656000.0, ans=0.0 2024-08-20 04:11:21,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4656100.0, ans=0.125 2024-08-20 04:11:22,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4656100.0, ans=0.1 2024-08-20 04:11:30,241 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2024-08-20 04:11:33,890 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 29 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 04:11:40,467 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.48 vs. limit=22.5 2024-08-20 04:11:42,764 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2024-08-20 04:11:50,490 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 6250, loss[loss=0.1153, beats_loss=0.01012, ecapa_loss=0.000167, whisper_loss=0.1035, over 22492.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01064, ecapa_loss=0.00014, whisper_loss=0.08995, over 3893946.71 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:12:09,515 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.244e+01 2.486e+01 2.895e+01 5.036e+01, threshold=4.971e+01, percent-clipped=0.0 2024-08-20 04:12:09,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4656400.0, ans=0.125 2024-08-20 04:12:19,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=4656400.0, ans=0.1 2024-08-20 04:12:32,722 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 04:13:06,715 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.45 vs. limit=15.0 2024-08-20 04:13:11,850 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 19 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-20 04:13:27,134 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=24.88 vs. limit=22.5 2024-08-20 04:13:29,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4656700.0, ans=0.07 2024-08-20 04:13:41,014 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 6300, loss[loss=0.1125, beats_loss=0.01005, ecapa_loss=0.0001349, whisper_loss=0.1011, over 19638.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0106, ecapa_loss=0.0001416, whisper_loss=0.08974, over 3843806.34 frames. ], batch size: 76, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:13:41,285 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 19 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-20 04:13:48,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4656800.0, ans=0.125 2024-08-20 04:14:00,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4656800.0, ans=0.125 2024-08-20 04:14:08,857 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 22 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-20 04:14:26,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4657000.0, ans=0.0 2024-08-20 04:14:39,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4657000.0, ans=0.1 2024-08-20 04:15:08,626 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.18 vs. limit=15.0 2024-08-20 04:15:11,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4657200.0, ans=0.125 2024-08-20 04:15:22,087 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 33 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-20 04:15:36,369 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 6350, loss[loss=0.09875, beats_loss=0.009224, ecapa_loss=0.0001455, whisper_loss=0.08808, over 24414.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01053, ecapa_loss=0.00014, whisper_loss=0.09009, over 3904180.69 frames. ], batch size: 97, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:15:56,331 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.231e+01 2.542e+01 2.829e+01 6.825e+01, threshold=5.084e+01, percent-clipped=1.0 2024-08-20 04:15:56,602 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 31 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-20 04:16:06,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4657400.0, ans=0.0 2024-08-20 04:16:17,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4657400.0, ans=0.125 2024-08-20 04:16:23,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4657500.0, ans=0.2 2024-08-20 04:16:25,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4657500.0, ans=0.0 2024-08-20 04:16:39,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4657600.0, ans=0.125 2024-08-20 04:16:55,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4657600.0, ans=0.0 2024-08-20 04:17:09,017 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 18 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-20 04:17:11,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4657700.0, ans=0.5 2024-08-20 04:17:26,690 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 6400, loss[loss=0.1076, beats_loss=0.009266, ecapa_loss=0.0001401, whisper_loss=0.09689, over 17541.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01053, ecapa_loss=0.00014, whisper_loss=0.08939, over 3870973.74 frames. ], batch size: 71, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:18:02,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4657900.0, ans=0.2 2024-08-20 04:18:17,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4658000.0, ans=0.1 2024-08-20 04:18:50,483 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 30 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 04:18:59,295 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 21 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-20 04:19:01,385 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 15 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-20 04:19:18,225 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 6450, loss[loss=0.1214, beats_loss=0.009828, ecapa_loss=0.0001393, whisper_loss=0.1102, over 18606.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01051, ecapa_loss=0.0001405, whisper_loss=0.08929, over 3835183.23 frames. ], batch size: 73, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:19:34,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4658300.0, ans=0.125 2024-08-20 04:19:38,702 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.209e+01 2.444e+01 2.735e+01 9.511e+01, threshold=4.888e+01, percent-clipped=1.0 2024-08-20 04:19:39,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4658400.0, ans=0.0 2024-08-20 04:19:42,056 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2024-08-20 04:19:52,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4658400.0, ans=0.2 2024-08-20 04:20:33,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4658600.0, ans=0.04949747468305833 2024-08-20 04:20:38,294 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 27 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-20 04:21:11,436 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 6500, loss[loss=0.1168, beats_loss=0.007914, ecapa_loss=0.0001531, whisper_loss=0.1074, over 15964.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01045, ecapa_loss=0.000141, whisper_loss=0.08983, over 3830703.63 frames. ], batch size: 64, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:21:16,160 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.64 vs. limit=22.5 2024-08-20 04:21:44,388 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 14 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-20 04:21:50,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4659000.0, ans=0.0 2024-08-20 04:23:02,356 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 6550, loss[loss=0.1081, beats_loss=0.01165, ecapa_loss=0.0001367, whisper_loss=0.09508, over 22152.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01045, ecapa_loss=0.0001408, whisper_loss=0.0901, over 3860974.19 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:23:07,679 INFO [train_multi_KD3.py:845] (0/4) A total of 97 cuts. 29 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-20 04:23:22,440 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 40 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-20 04:23:23,955 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.308e+01 2.565e+01 2.877e+01 4.180e+01, threshold=5.130e+01, percent-clipped=0.0 2024-08-20 04:24:00,457 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.498e+01 2024-08-20 04:24:03,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4659500.0, ans=0.0 2024-08-20 04:24:40,294 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 26 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 04:24:40,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4659700.0, ans=0.035 2024-08-20 04:24:42,695 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.73 vs. limit=15.0 2024-08-20 04:24:54,466 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.04 vs. limit=22.5 2024-08-20 04:24:56,164 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.98 vs. limit=15.0 2024-08-20 04:25:01,160 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 6600, loss[loss=0.1164, beats_loss=0.009504, ecapa_loss=0.0001276, whisper_loss=0.1056, over 21513.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01043, ecapa_loss=0.0001404, whisper_loss=0.09071, over 3885112.74 frames. ], batch size: 86, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:25:18,681 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 04:25:19,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4659900.0, ans=0.2 2024-08-20 04:26:14,912 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 23 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 04:26:16,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4660100.0, ans=0.125 2024-08-20 04:26:40,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4660200.0, ans=0.125 2024-08-20 04:26:44,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4660200.0, ans=0.0 2024-08-20 04:26:50,213 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 26 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 04:26:50,909 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.53 vs. limit=15.0 2024-08-20 04:26:52,837 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 6650, loss[loss=0.1193, beats_loss=0.00944, ecapa_loss=0.0001302, whisper_loss=0.1085, over 18035.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01033, ecapa_loss=0.0001421, whisper_loss=0.09142, over 3837626.78 frames. ], batch size: 66, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:27:05,003 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.40 vs. limit=15.0 2024-08-20 04:27:14,224 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.439e+01 2.716e+01 3.206e+01 5.057e+01, threshold=5.432e+01, percent-clipped=0.0 2024-08-20 04:27:31,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4660400.0, ans=0.125 2024-08-20 04:27:57,620 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2024-08-20 04:28:10,269 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.27 vs. limit=10.0 2024-08-20 04:28:14,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4660600.0, ans=0.1 2024-08-20 04:28:29,299 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-20 04:28:52,036 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 6700, loss[loss=0.07424, beats_loss=0.01238, ecapa_loss=0.000118, whisper_loss=0.06069, over 17673.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01032, ecapa_loss=0.000142, whisper_loss=0.09165, over 3874959.63 frames. ], batch size: 70, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:28:58,974 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 26 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 04:29:08,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4660800.0, ans=0.125 2024-08-20 04:29:34,685 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 19 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-20 04:29:53,815 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 17 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-20 04:30:07,086 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 27 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-20 04:30:09,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4661100.0, ans=0.125 2024-08-20 04:30:11,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4661100.0, ans=0.125 2024-08-20 04:30:16,943 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-20 04:30:30,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4661200.0, ans=0.125 2024-08-20 04:30:32,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4661200.0, ans=0.125 2024-08-20 04:30:40,446 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.233e+01 2024-08-20 04:30:47,700 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 28 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-20 04:30:49,563 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 6750, loss[loss=0.1095, beats_loss=0.009804, ecapa_loss=0.0001365, whisper_loss=0.09828, over 20854.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01032, ecapa_loss=0.0001427, whisper_loss=0.09087, over 3881508.86 frames. ], batch size: 81, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:31:03,399 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 04:31:08,748 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.685e+01 2.262e+01 2.503e+01 2.805e+01 3.998e+01, threshold=5.006e+01, percent-clipped=0.0 2024-08-20 04:31:21,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4661400.0, ans=0.2 2024-08-20 04:31:21,832 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.78 vs. limit=15.0 2024-08-20 04:31:35,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4661500.0, ans=0.0 2024-08-20 04:32:10,893 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.96 vs. limit=15.0 2024-08-20 04:32:12,643 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 20 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-20 04:32:26,533 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.51 vs. limit=15.0 2024-08-20 04:32:41,869 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 6800, loss[loss=0.09304, beats_loss=0.01219, ecapa_loss=0.0001108, whisper_loss=0.07974, over 16395.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01032, ecapa_loss=0.0001425, whisper_loss=0.09035, over 3844880.84 frames. ], batch size: 64, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:32:44,254 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 22 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-20 04:32:49,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4661800.0, ans=0.125 2024-08-20 04:32:58,491 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 30 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-20 04:33:00,826 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 25 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-20 04:33:18,722 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 21 from LS+wenet, 24 from Vox, 18 fro AS 2024-08-20 04:33:47,868 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.52 vs. limit=15.0 2024-08-20 04:34:12,297 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 21 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 04:34:33,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4662300.0, ans=0.125 2024-08-20 04:34:35,305 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 6850, loss[loss=0.0946, beats_loss=0.01013, ecapa_loss=0.0001314, whisper_loss=0.08315, over 15328.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01028, ecapa_loss=0.000142, whisper_loss=0.09101, over 3833482.57 frames. ], batch size: 58, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:34:54,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4662300.0, ans=0.0 2024-08-20 04:34:55,750 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.273e+01 2.508e+01 2.881e+01 4.383e+01, threshold=5.015e+01, percent-clipped=0.0 2024-08-20 04:35:06,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4662400.0, ans=0.125 2024-08-20 04:35:13,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4662400.0, ans=0.1 2024-08-20 04:35:29,157 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 04:36:06,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4662700.0, ans=0.125 2024-08-20 04:36:27,890 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 6900, loss[loss=0.1191, beats_loss=0.009295, ecapa_loss=0.000111, whisper_loss=0.1087, over 23359.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01032, ecapa_loss=0.0001409, whisper_loss=0.09128, over 3837642.09 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:37:06,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4662900.0, ans=0.0 2024-08-20 04:37:41,007 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 29 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-20 04:37:54,446 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 15 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 04:38:00,725 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 28 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-20 04:38:14,715 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 6950, loss[loss=0.1068, beats_loss=0.009517, ecapa_loss=0.0001475, whisper_loss=0.09579, over 19521.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0104, ecapa_loss=0.0001401, whisper_loss=0.09028, over 3834383.86 frames. ], batch size: 78, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 04:38:24,130 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 18 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 04:38:35,735 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.402e+01 2.667e+01 2.923e+01 3.663e+02, threshold=5.334e+01, percent-clipped=2.0 2024-08-20 04:38:44,405 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 21 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-20 04:39:10,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4663500.0, ans=0.0 2024-08-20 04:39:21,940 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 18 from LS+wenet, 25 from Vox, 20 fro AS 2024-08-20 04:39:57,331 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.00 vs. limit=12.0 2024-08-20 04:39:58,160 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 7000, loss[loss=0.107, beats_loss=0.007485, ecapa_loss=0.0001354, whisper_loss=0.09813, over 19525.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0103, ecapa_loss=0.0001416, whisper_loss=0.08999, over 3809106.35 frames. ], batch size: 75, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:40:29,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4663900.0, ans=0.125 2024-08-20 04:40:35,330 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 21 from LS+wenet, 8 from Vox, 25 fro AS 2024-08-20 04:40:46,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4664000.0, ans=0.0 2024-08-20 04:40:50,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4664000.0, ans=0.05 2024-08-20 04:40:53,544 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.41 vs. limit=22.5 2024-08-20 04:40:59,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4664100.0, ans=0.125 2024-08-20 04:41:20,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4664200.0, ans=0.0 2024-08-20 04:41:28,060 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 19 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-20 04:41:31,587 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 7050, loss[loss=0.1255, beats_loss=0.01007, ecapa_loss=0.000128, whisper_loss=0.1142, over 19286.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01034, ecapa_loss=0.0001413, whisper_loss=0.09, over 3803898.27 frames. ], batch size: 77, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:41:47,558 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.321e+01 2.580e+01 2.916e+01 2.806e+02, threshold=5.159e+01, percent-clipped=2.0 2024-08-20 04:42:01,671 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-20 04:42:09,323 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.366e+00 2024-08-20 04:42:16,144 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-20 04:42:55,080 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-20 04:43:05,567 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 7100, loss[loss=0.1157, beats_loss=0.009332, ecapa_loss=0.00013, whisper_loss=0.1051, over 18414.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01041, ecapa_loss=0.0001409, whisper_loss=0.08982, over 3821473.77 frames. ], batch size: 69, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:43:07,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4664800.0, ans=0.2 2024-08-20 04:43:25,549 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.33 vs. limit=22.5 2024-08-20 04:43:26,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4664900.0, ans=0.125 2024-08-20 04:43:30,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4664900.0, ans=0.125 2024-08-20 04:43:41,929 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.48 vs. limit=15.0 2024-08-20 04:43:56,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4665000.0, ans=0.125 2024-08-20 04:44:03,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4665000.0, ans=0.07 2024-08-20 04:44:32,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4665200.0, ans=0.0 2024-08-20 04:44:43,470 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 22 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-20 04:44:57,407 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 7150, loss[loss=0.1164, beats_loss=0.005697, ecapa_loss=0.000153, whisper_loss=0.1092, over 17223.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01043, ecapa_loss=0.0001412, whisper_loss=0.08945, over 3826223.41 frames. ], batch size: 66, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:45:05,715 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=15.0 2024-08-20 04:45:09,575 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 26 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-20 04:45:17,370 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.230e+01 2.408e+01 2.713e+01 4.387e+01, threshold=4.817e+01, percent-clipped=0.0 2024-08-20 04:45:20,950 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 04:45:38,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4665400.0, ans=0.125 2024-08-20 04:46:16,555 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 26 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 04:46:34,895 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.89 vs. limit=15.0 2024-08-20 04:46:52,135 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 7200, loss[loss=0.1188, beats_loss=0.009468, ecapa_loss=0.0001385, whisper_loss=0.108, over 22566.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01045, ecapa_loss=0.0001416, whisper_loss=0.08939, over 3836887.57 frames. ], batch size: 87, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:47:10,802 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.16 vs. limit=15.0 2024-08-20 04:47:24,702 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 21 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-20 04:47:39,051 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 24 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-20 04:47:41,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4666000.0, ans=0.2 2024-08-20 04:48:08,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4666100.0, ans=0.125 2024-08-20 04:48:22,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4666200.0, ans=0.0 2024-08-20 04:48:26,765 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 14 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-20 04:48:31,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4666200.0, ans=0.5 2024-08-20 04:48:38,151 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 31 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-20 04:48:44,272 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 7250, loss[loss=0.0948, beats_loss=0.009018, ecapa_loss=0.0001582, whisper_loss=0.0842, over 21824.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01031, ecapa_loss=0.0001419, whisper_loss=0.08976, over 3819152.72 frames. ], batch size: 88, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:48:59,493 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 15 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-20 04:49:04,003 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.278e+01 2.449e+01 2.713e+01 3.965e+01, threshold=4.897e+01, percent-clipped=0.0 2024-08-20 04:49:11,055 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2024-08-20 04:49:16,862 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 30 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-20 04:49:17,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4666400.0, ans=0.1 2024-08-20 04:49:19,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4666400.0, ans=0.125 2024-08-20 04:49:40,804 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 21 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-20 04:49:41,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4666500.0, ans=0.125 2024-08-20 04:50:25,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4666700.0, ans=0.1 2024-08-20 04:50:25,689 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.52 vs. limit=15.0 2024-08-20 04:50:29,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4666700.0, ans=0.2 2024-08-20 04:50:33,853 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 7300, loss[loss=0.09168, beats_loss=0.01087, ecapa_loss=0.0001327, whisper_loss=0.07947, over 19340.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01035, ecapa_loss=0.0001424, whisper_loss=0.08915, over 3827027.64 frames. ], batch size: 77, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:51:06,402 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 25 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-20 04:51:53,979 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.20 vs. limit=15.0 2024-08-20 04:52:02,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4667100.0, ans=0.05 2024-08-20 04:52:13,932 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 21 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 04:52:29,459 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 7350, loss[loss=0.1058, beats_loss=0.01177, ecapa_loss=0.0001343, whisper_loss=0.09266, over 21071.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01034, ecapa_loss=0.000142, whisper_loss=0.08963, over 3847277.12 frames. ], batch size: 85, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:52:29,748 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 20 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-20 04:52:41,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4667300.0, ans=0.125 2024-08-20 04:52:50,460 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.276e+01 2.449e+01 2.717e+01 4.858e+01, threshold=4.897e+01, percent-clipped=0.0 2024-08-20 04:52:53,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4667400.0, ans=0.125 2024-08-20 04:53:18,341 INFO [train_multi_KD3.py:845] (0/4) A total of 49 cuts. 12 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-20 04:53:25,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4667500.0, ans=0.125 2024-08-20 04:53:34,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4667500.0, ans=0.2 2024-08-20 04:53:48,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4667600.0, ans=0.0 2024-08-20 04:54:20,628 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 7400, loss[loss=0.1156, beats_loss=0.01098, ecapa_loss=0.0001332, whisper_loss=0.1033, over 17036.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01032, ecapa_loss=0.0001418, whisper_loss=0.09013, over 3847327.43 frames. ], batch size: 68, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:54:21,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4667800.0, ans=0.07 2024-08-20 04:54:56,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4667900.0, ans=0.125 2024-08-20 04:55:09,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4668000.0, ans=0.125 2024-08-20 04:55:26,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4668000.0, ans=0.125 2024-08-20 04:55:35,304 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 04:55:55,514 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2024-08-20 04:56:07,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4668200.0, ans=0.2 2024-08-20 04:56:18,302 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 7450, loss[loss=0.09055, beats_loss=0.01077, ecapa_loss=0.0001099, whisper_loss=0.07869, over 18271.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01035, ecapa_loss=0.0001421, whisper_loss=0.08992, over 3840430.85 frames. ], batch size: 70, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:56:21,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4668300.0, ans=0.125 2024-08-20 04:56:29,900 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 27 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 04:56:39,311 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.202e+01 2.465e+01 2.731e+01 3.799e+01, threshold=4.929e+01, percent-clipped=0.0 2024-08-20 04:56:47,067 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 18 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-20 04:56:47,788 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.53 vs. limit=22.5 2024-08-20 04:56:57,219 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.21 vs. limit=15.0 2024-08-20 04:56:59,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4668400.0, ans=0.125 2024-08-20 04:57:04,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4668500.0, ans=0.2 2024-08-20 04:57:20,406 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 24 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-20 04:57:54,584 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 22 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 04:57:54,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4668700.0, ans=0.1 2024-08-20 04:58:01,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.77 vs. limit=10.0 2024-08-20 04:58:06,516 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2024-08-20 04:58:12,396 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 7500, loss[loss=0.131, beats_loss=0.01017, ecapa_loss=0.000128, whisper_loss=0.1195, over 23882.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01039, ecapa_loss=0.0001414, whisper_loss=0.08982, over 3853129.97 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 04:58:33,227 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 32 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 04:59:09,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4669000.0, ans=0.0 2024-08-20 04:59:19,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4669100.0, ans=0.025 2024-08-20 04:59:29,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4669100.0, ans=0.5 2024-08-20 04:59:40,071 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2024-08-20 04:59:48,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4669200.0, ans=0.125 2024-08-20 05:00:05,034 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 7550, loss[loss=0.1095, beats_loss=0.01035, ecapa_loss=0.0001544, whisper_loss=0.09761, over 23897.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01038, ecapa_loss=0.0001423, whisper_loss=0.08969, over 3868702.99 frames. ], batch size: 95, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:00:08,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4669300.0, ans=0.125 2024-08-20 05:00:08,633 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.28 vs. limit=15.0 2024-08-20 05:00:15,138 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 25 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-20 05:00:22,605 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.706e+01 2.290e+01 2.609e+01 3.060e+01 2.674e+02, threshold=5.218e+01, percent-clipped=1.0 2024-08-20 05:00:29,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4669400.0, ans=0.125 2024-08-20 05:01:10,662 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 24 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-20 05:01:55,961 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 05:01:57,851 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 7600, loss[loss=0.1191, beats_loss=0.008972, ecapa_loss=0.0001323, whisper_loss=0.1088, over 23641.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01038, ecapa_loss=0.0001424, whisper_loss=0.0897, over 3886407.48 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:01:58,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4669800.0, ans=0.0 2024-08-20 05:02:02,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4669800.0, ans=0.1 2024-08-20 05:02:17,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4669800.0, ans=0.2 2024-08-20 05:02:28,580 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 24 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 05:02:49,431 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 33 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-20 05:03:33,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=4670200.0, ans=0.95 2024-08-20 05:03:37,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4670200.0, ans=0.125 2024-08-20 05:03:39,921 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 31 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-20 05:03:48,109 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 7650, loss[loss=0.1082, beats_loss=0.01063, ecapa_loss=0.0001548, whisper_loss=0.09603, over 21595.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01038, ecapa_loss=0.0001421, whisper_loss=0.09058, over 3888891.71 frames. ], batch size: 92, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:04:08,386 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.986e+01 2.328e+01 2.537e+01 2.838e+01 5.582e+01, threshold=5.074e+01, percent-clipped=1.0 2024-08-20 05:04:11,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4670400.0, ans=0.95 2024-08-20 05:04:36,311 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.96 vs. limit=15.0 2024-08-20 05:04:43,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4670500.0, ans=0.2 2024-08-20 05:04:47,627 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.52 vs. limit=15.0 2024-08-20 05:04:51,156 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 24 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 05:05:14,595 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 17 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 05:05:32,574 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 7700, loss[loss=0.1048, beats_loss=0.007661, ecapa_loss=0.0001603, whisper_loss=0.09553, over 20213.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01042, ecapa_loss=0.0001413, whisper_loss=0.0903, over 3849209.65 frames. ], batch size: 80, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:05:49,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4670800.0, ans=0.125 2024-08-20 05:06:34,712 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.79 vs. limit=22.5 2024-08-20 05:07:15,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4671200.0, ans=0.0 2024-08-20 05:07:17,338 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 05:07:24,119 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-08-20 05:07:25,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4671300.0, ans=0.2 2024-08-20 05:07:28,233 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 7750, loss[loss=0.1091, beats_loss=0.007766, ecapa_loss=0.0001559, whisper_loss=0.09977, over 23057.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0104, ecapa_loss=0.0001401, whisper_loss=0.09041, over 3823315.22 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:07:49,778 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.264e+01 2.430e+01 2.732e+01 4.233e+01, threshold=4.861e+01, percent-clipped=0.0 2024-08-20 05:08:24,322 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.18 vs. limit=15.0 2024-08-20 05:08:56,344 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 20 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-20 05:08:56,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4671600.0, ans=0.0 2024-08-20 05:09:31,019 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 7800, loss[loss=0.08337, beats_loss=0.01226, ecapa_loss=0.0001268, whisper_loss=0.06984, over 21468.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01048, ecapa_loss=0.0001387, whisper_loss=0.09006, over 3832594.48 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:09:39,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4671800.0, ans=0.0 2024-08-20 05:09:44,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4671800.0, ans=0.0 2024-08-20 05:09:49,015 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 05:09:58,272 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 27 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 05:10:08,129 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2024-08-20 05:11:05,961 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 19 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-20 05:11:12,579 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 30 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-20 05:11:12,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4672200.0, ans=0.1 2024-08-20 05:11:26,334 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 7850, loss[loss=0.1023, beats_loss=0.01004, ecapa_loss=0.0001324, whisper_loss=0.09096, over 19553.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01046, ecapa_loss=0.0001386, whisper_loss=0.08958, over 3846666.05 frames. ], batch size: 75, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:11:34,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=4672300.0, ans=0.02 2024-08-20 05:11:46,581 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.268e+01 2.521e+01 2.830e+01 3.600e+01, threshold=5.042e+01, percent-clipped=0.0 2024-08-20 05:12:49,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4672600.0, ans=0.1 2024-08-20 05:12:51,120 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 16 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 05:13:00,337 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.94 vs. limit=22.5 2024-08-20 05:13:14,787 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 7900, loss[loss=0.09118, beats_loss=0.01212, ecapa_loss=0.0001586, whisper_loss=0.07747, over 14918.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01045, ecapa_loss=0.0001393, whisper_loss=0.08927, over 3804414.40 frames. ], batch size: 62, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:13:25,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4672800.0, ans=0.025 2024-08-20 05:13:37,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4672900.0, ans=0.125 2024-08-20 05:13:57,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4673000.0, ans=0.125 2024-08-20 05:14:02,772 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 15 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-20 05:14:18,999 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2024-08-20 05:14:21,089 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 26 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-20 05:14:26,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4673100.0, ans=0.1 2024-08-20 05:14:27,932 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 21 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-20 05:14:33,250 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 17 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-20 05:14:45,406 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.54 vs. limit=10.0 2024-08-20 05:14:57,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4673200.0, ans=0.07 2024-08-20 05:15:08,732 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 7950, loss[loss=0.0847, beats_loss=0.01126, ecapa_loss=0.0001278, whisper_loss=0.07216, over 12744.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01053, ecapa_loss=0.0001383, whisper_loss=0.08875, over 3775515.60 frames. ], batch size: 50, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:15:09,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4673300.0, ans=0.0 2024-08-20 05:15:18,253 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 24 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-20 05:15:28,182 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.672e+01 2.282e+01 2.544e+01 2.823e+01 6.203e+01, threshold=5.088e+01, percent-clipped=1.0 2024-08-20 05:16:57,533 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 8000, loss[loss=0.09014, beats_loss=0.01241, ecapa_loss=0.0001507, whisper_loss=0.07622, over 14681.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01045, ecapa_loss=0.0001398, whisper_loss=0.08958, over 3794618.42 frames. ], batch size: 63, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:17:12,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4673800.0, ans=0.125 2024-08-20 05:17:18,777 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 30 from LS+wenet, 12 from Vox, 50 fro AS 2024-08-20 05:17:27,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.26 vs. limit=15.0 2024-08-20 05:17:29,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4673900.0, ans=0.125 2024-08-20 05:17:34,004 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-20 05:17:34,779 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.14 vs. limit=22.5 2024-08-20 05:17:46,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4674000.0, ans=0.1 2024-08-20 05:18:17,855 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 16 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 05:18:26,847 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 21 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 05:18:27,180 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 05:18:37,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4674200.0, ans=0.1 2024-08-20 05:18:39,995 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2024-08-20 05:18:41,448 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 8050, loss[loss=0.09198, beats_loss=0.009794, ecapa_loss=0.0001451, whisper_loss=0.08074, over 17752.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01036, ecapa_loss=0.0001407, whisper_loss=0.09026, over 3777954.51 frames. ], batch size: 73, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:18:46,207 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.24 vs. limit=22.5 2024-08-20 05:18:49,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4674300.0, ans=0.125 2024-08-20 05:18:53,848 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 05:18:59,651 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.351e+01 2.548e+01 2.857e+01 8.304e+01, threshold=5.095e+01, percent-clipped=2.0 2024-08-20 05:19:16,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4674400.0, ans=0.5 2024-08-20 05:19:29,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4674500.0, ans=0.1 2024-08-20 05:19:41,160 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 18 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-20 05:19:46,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4674600.0, ans=0.0 2024-08-20 05:20:14,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4674700.0, ans=0.5 2024-08-20 05:20:19,030 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 05:20:25,119 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 18 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 05:20:32,105 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 8100, loss[loss=0.09972, beats_loss=0.01018, ecapa_loss=0.0001441, whisper_loss=0.0881, over 14114.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01033, ecapa_loss=0.0001401, whisper_loss=0.09067, over 3758621.50 frames. ], batch size: 55, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:20:45,885 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 15 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 05:20:50,490 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 05:20:56,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4674900.0, ans=0.1 2024-08-20 05:21:10,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4674900.0, ans=0.125 2024-08-20 05:21:19,584 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.68 vs. limit=10.0 2024-08-20 05:21:33,258 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 25 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-20 05:21:33,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4675000.0, ans=0.0 2024-08-20 05:21:33,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4675000.0, ans=0.2 2024-08-20 05:21:54,292 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.71 vs. limit=22.5 2024-08-20 05:22:05,225 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 25 from LS+wenet, 34 from Vox, 31 fro AS 2024-08-20 05:22:09,097 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.25 vs. limit=22.5 2024-08-20 05:22:25,229 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 8150, loss[loss=0.08441, beats_loss=0.01198, ecapa_loss=0.0001338, whisper_loss=0.07109, over 19362.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01038, ecapa_loss=0.0001408, whisper_loss=0.08992, over 3782290.89 frames. ], batch size: 79, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:22:47,187 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.721e+01 2.184e+01 2.427e+01 2.667e+01 4.030e+01, threshold=4.854e+01, percent-clipped=0.0 2024-08-20 05:22:47,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4675400.0, ans=0.1 2024-08-20 05:23:03,101 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.04 vs. limit=15.0 2024-08-20 05:23:04,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4675400.0, ans=0.125 2024-08-20 05:23:19,365 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 28 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-20 05:23:22,229 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 36 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 05:23:40,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4675600.0, ans=0.125 2024-08-20 05:23:49,593 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 20 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 05:24:21,741 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 8200, loss[loss=0.07467, beats_loss=0.01356, ecapa_loss=0.0001447, whisper_loss=0.05966, over 18523.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01037, ecapa_loss=0.0001407, whisper_loss=0.08965, over 3778965.60 frames. ], batch size: 81, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:24:25,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4675800.0, ans=0.125 2024-08-20 05:24:30,578 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.87 vs. limit=12.0 2024-08-20 05:24:48,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4675900.0, ans=0.2 2024-08-20 05:25:06,447 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 27 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 05:25:20,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4676000.0, ans=0.125 2024-08-20 05:25:34,274 WARNING [optim.py:496] (0/4) Scaling gradients by 0.03858109936118126, model_norm_threshold=48.536659240722656 2024-08-20 05:25:34,429 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.171e+05, grad_sumsq=2.171e+05, orig_rms_sq=1.000e+00 2024-08-20 05:25:40,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4676100.0, ans=0.0 2024-08-20 05:25:46,641 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 33 from LS+wenet, 12 from Vox, 45 fro AS 2024-08-20 05:26:09,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4676200.0, ans=0.1 2024-08-20 05:26:13,479 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 19 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 05:26:15,885 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 8250, loss[loss=0.08342, beats_loss=0.01238, ecapa_loss=0.00015, whisper_loss=0.06954, over 17301.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01036, ecapa_loss=0.0001407, whisper_loss=0.09, over 3773913.84 frames. ], batch size: 74, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:26:18,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=4676300.0, ans=15.0 2024-08-20 05:26:25,315 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 05:26:36,937 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.406e+01 2.629e+01 3.126e+01 1.258e+03, threshold=5.257e+01, percent-clipped=4.0 2024-08-20 05:26:43,846 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 23 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-20 05:27:02,975 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-20 05:27:20,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4676500.0, ans=0.125 2024-08-20 05:27:53,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4676700.0, ans=0.2 2024-08-20 05:28:15,363 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 8300, loss[loss=0.08863, beats_loss=0.01218, ecapa_loss=0.0001455, whisper_loss=0.07499, over 21687.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01038, ecapa_loss=0.0001404, whisper_loss=0.08988, over 3794391.09 frames. ], batch size: 94, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:28:36,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4676900.0, ans=0.125 2024-08-20 05:28:43,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4676900.0, ans=0.125 2024-08-20 05:29:05,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4677000.0, ans=0.2 2024-08-20 05:29:31,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4677100.0, ans=0.09899494936611666 2024-08-20 05:29:43,171 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 15 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 05:29:49,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4677200.0, ans=0.125 2024-08-20 05:30:02,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4677200.0, ans=0.125 2024-08-20 05:30:08,650 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 8350, loss[loss=0.1157, beats_loss=0.01072, ecapa_loss=0.0001636, whisper_loss=0.1034, over 21482.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01036, ecapa_loss=0.0001405, whisper_loss=0.09019, over 3810091.94 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:30:26,964 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.380e+01 2.610e+01 3.013e+01 5.449e+01, threshold=5.219e+01, percent-clipped=1.0 2024-08-20 05:30:27,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4677400.0, ans=0.2 2024-08-20 05:30:43,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4677400.0, ans=0.125 2024-08-20 05:30:43,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4677400.0, ans=0.125 2024-08-20 05:31:32,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4677700.0, ans=0.1 2024-08-20 05:31:38,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4677700.0, ans=0.125 2024-08-20 05:31:41,192 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 05:31:41,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4677700.0, ans=0.125 2024-08-20 05:31:48,900 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 8400, loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001321, whisper_loss=0.09038, over 17186.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01038, ecapa_loss=0.0001403, whisper_loss=0.08996, over 3841274.53 frames. ], batch size: 68, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:32:05,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4677800.0, ans=0.125 2024-08-20 05:32:10,480 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 20 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 05:32:28,850 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 05:32:33,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4678000.0, ans=0.1 2024-08-20 05:32:58,810 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=16.31 vs. limit=15.0 2024-08-20 05:33:44,286 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 8450, loss[loss=0.07925, beats_loss=0.01095, ecapa_loss=0.0001673, whisper_loss=0.06662, over 21789.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01041, ecapa_loss=0.0001408, whisper_loss=0.08952, over 3841046.47 frames. ], batch size: 93, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:33:51,815 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.05 vs. limit=15.0 2024-08-20 05:34:00,251 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 23 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 05:34:02,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4678300.0, ans=10.0 2024-08-20 05:34:03,918 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.232e+01 2.452e+01 2.661e+01 1.500e+02, threshold=4.905e+01, percent-clipped=2.0 2024-08-20 05:34:07,081 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.73 vs. limit=10.0 2024-08-20 05:34:12,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4678400.0, ans=0.2 2024-08-20 05:34:17,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4678400.0, ans=0.125 2024-08-20 05:34:19,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4678400.0, ans=0.125 2024-08-20 05:34:31,812 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=12.0 2024-08-20 05:34:34,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4678500.0, ans=0.0 2024-08-20 05:34:42,629 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 20 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-20 05:35:22,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4678700.0, ans=0.1 2024-08-20 05:35:23,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4678700.0, ans=0.125 2024-08-20 05:35:33,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4678800.0, ans=0.1 2024-08-20 05:35:35,745 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 8500, loss[loss=0.1016, beats_loss=0.01237, ecapa_loss=0.0001104, whisper_loss=0.08813, over 21770.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.0105, ecapa_loss=0.0001403, whisper_loss=0.0887, over 3836534.59 frames. ], batch size: 87, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:35:49,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4678800.0, ans=0.1 2024-08-20 05:36:33,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4679000.0, ans=0.125 2024-08-20 05:36:37,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4679000.0, ans=0.125 2024-08-20 05:36:43,159 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 24 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-20 05:36:50,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4679100.0, ans=0.125 2024-08-20 05:36:59,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4679100.0, ans=0.125 2024-08-20 05:37:14,217 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=6.900e-02 2024-08-20 05:37:26,462 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2024-08-20 05:37:30,690 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 8550, loss[loss=0.1189, beats_loss=0.009767, ecapa_loss=0.000135, whisper_loss=0.1078, over 20354.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01046, ecapa_loss=0.0001407, whisper_loss=0.0891, over 3845416.77 frames. ], batch size: 81, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:37:45,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4679300.0, ans=0.1 2024-08-20 05:37:48,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4679300.0, ans=0.2 2024-08-20 05:37:50,664 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.232e+01 2.501e+01 2.726e+01 3.621e+01, threshold=5.003e+01, percent-clipped=0.0 2024-08-20 05:37:51,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4679400.0, ans=0.125 2024-08-20 05:37:53,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4679400.0, ans=0.125 2024-08-20 05:38:03,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4679400.0, ans=0.0 2024-08-20 05:38:20,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4679500.0, ans=0.1 2024-08-20 05:38:52,752 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 28 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-20 05:39:16,706 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 05:39:21,346 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.578e+00 2024-08-20 05:39:27,860 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 8600, loss[loss=0.1035, beats_loss=0.01084, ecapa_loss=0.0001137, whisper_loss=0.09152, over 13543.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01049, ecapa_loss=0.0001405, whisper_loss=0.08852, over 3855144.77 frames. ], batch size: 51, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:39:45,292 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 25 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 05:40:11,435 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-468000.pt 2024-08-20 05:40:38,414 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 28 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-20 05:41:02,371 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 29 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-20 05:41:06,236 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 22 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-20 05:41:17,511 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 8650, loss[loss=0.08299, beats_loss=0.008501, ecapa_loss=0.0001477, whisper_loss=0.07301, over 13026.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01045, ecapa_loss=0.0001402, whisper_loss=0.08909, over 3844723.54 frames. ], batch size: 50, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:41:18,108 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.931e+01 2024-08-20 05:41:39,191 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.692e+01 2.245e+01 2.496e+01 2.765e+01 3.926e+01, threshold=4.992e+01, percent-clipped=0.0 2024-08-20 05:43:03,927 WARNING [optim.py:496] (0/4) Scaling gradients by 0.040249649435281754, model_norm_threshold=49.920475006103516 2024-08-20 05:43:04,080 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.371e+05, grad_sumsq=4.172e+04, orig_rms_sq=3.286e+00 2024-08-20 05:43:15,071 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 8700, loss[loss=0.09353, beats_loss=0.009743, ecapa_loss=0.0001494, whisper_loss=0.08229, over 13813.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01039, ecapa_loss=0.0001403, whisper_loss=0.08886, over 3820762.21 frames. ], batch size: 54, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:43:15,337 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 23 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-20 05:43:51,478 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.07 vs. limit=15.0 2024-08-20 05:44:02,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4681000.0, ans=0.125 2024-08-20 05:44:16,057 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 16 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 05:44:21,269 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 24 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 05:44:26,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4681100.0, ans=0.125 2024-08-20 05:44:30,801 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 22 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-20 05:44:50,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4681200.0, ans=0.125 2024-08-20 05:45:08,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4681300.0, ans=0.0 2024-08-20 05:45:10,986 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 8750, loss[loss=0.1023, beats_loss=0.00977, ecapa_loss=0.0001666, whisper_loss=0.09086, over 21583.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01039, ecapa_loss=0.0001399, whisper_loss=0.08971, over 3804827.39 frames. ], batch size: 89, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:45:25,750 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.40 vs. limit=15.0 2024-08-20 05:45:32,011 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.309e+01 2.539e+01 2.876e+01 1.240e+03, threshold=5.077e+01, percent-clipped=3.0 2024-08-20 05:46:30,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4681600.0, ans=0.125 2024-08-20 05:46:38,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4681600.0, ans=0.1 2024-08-20 05:46:43,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=4681700.0, ans=15.0 2024-08-20 05:47:03,870 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 8800, loss[loss=0.1122, beats_loss=0.0104, ecapa_loss=0.0001249, whisper_loss=0.1005, over 22011.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01041, ecapa_loss=0.0001406, whisper_loss=0.08995, over 3782544.57 frames. ], batch size: 88, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:47:25,315 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0940530002117157, model_norm_threshold=50.77210235595703 2024-08-20 05:47:25,474 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.3.norm.log_scale with proportion 0.21, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.105e+04, grad_sumsq=6.105e+04, orig_rms_sq=1.000e+00 2024-08-20 05:48:05,769 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-20 05:48:13,021 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 25 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-20 05:48:13,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4682100.0, ans=0.05 2024-08-20 05:48:32,803 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 35 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 05:48:33,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4682200.0, ans=0.2 2024-08-20 05:48:42,396 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 8850, loss[loss=0.1155, beats_loss=0.009278, ecapa_loss=0.0001739, whisper_loss=0.1045, over 20910.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01044, ecapa_loss=0.000141, whisper_loss=0.08962, over 3760096.89 frames. ], batch size: 87, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:48:45,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4682300.0, ans=0.2 2024-08-20 05:48:52,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4682300.0, ans=0.1 2024-08-20 05:48:53,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4682300.0, ans=0.2 2024-08-20 05:48:58,981 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.309e+01 2.540e+01 2.877e+01 5.398e+02, threshold=5.080e+01, percent-clipped=3.0 2024-08-20 05:49:14,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4682400.0, ans=0.07 2024-08-20 05:49:21,105 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.21 vs. limit=15.0 2024-08-20 05:49:23,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=4682500.0, ans=15.0 2024-08-20 05:49:37,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4682500.0, ans=0.0 2024-08-20 05:49:41,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4682600.0, ans=0.125 2024-08-20 05:49:44,647 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 27 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 05:50:02,694 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 33 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 05:50:13,493 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 15 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-20 05:50:18,901 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 8900, loss[loss=0.08661, beats_loss=0.013, ecapa_loss=0.0001076, whisper_loss=0.07253, over 22469.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01047, ecapa_loss=0.0001402, whisper_loss=0.08975, over 3812129.34 frames. ], batch size: 91, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:50:21,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4682800.0, ans=0.015 2024-08-20 05:50:40,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4682900.0, ans=0.0 2024-08-20 05:50:42,201 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.413e-01 2024-08-20 05:50:54,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4683000.0, ans=0.2 2024-08-20 05:51:01,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4683000.0, ans=0.1 2024-08-20 05:51:07,384 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 28 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-20 05:51:09,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4683000.0, ans=0.125 2024-08-20 05:51:15,653 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.67 vs. limit=15.0 2024-08-20 05:51:20,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4683100.0, ans=0.07 2024-08-20 05:51:25,158 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 15 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 05:51:25,600 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.46 vs. limit=12.0 2024-08-20 05:51:54,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4683300.0, ans=0.0 2024-08-20 05:51:55,963 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 8950, loss[loss=0.1152, beats_loss=0.009219, ecapa_loss=0.0001566, whisper_loss=0.1044, over 19695.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.0001406, whisper_loss=0.09039, over 3798423.61 frames. ], batch size: 80, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 05:52:03,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4683300.0, ans=0.2 2024-08-20 05:52:10,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4683300.0, ans=0.2 2024-08-20 05:52:12,006 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.265e+01 2.516e+01 2.733e+01 4.609e+01, threshold=5.033e+01, percent-clipped=0.0 2024-08-20 05:52:16,461 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 05:52:18,139 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-20 05:52:45,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4683500.0, ans=0.125 2024-08-20 05:52:51,849 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 05:53:00,533 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 18 from LS+wenet, 23 from Vox, 13 fro AS 2024-08-20 05:53:00,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4683600.0, ans=0.0 2024-08-20 05:53:25,538 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 9000, loss[loss=0.09303, beats_loss=0.009901, ecapa_loss=0.0001211, whisper_loss=0.08192, over 15266.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01041, ecapa_loss=0.0001407, whisper_loss=0.09056, over 3786954.93 frames. ], batch size: 56, lr: 1.91e-03, grad_scale: 1.152921504606847e+18 2024-08-20 05:53:25,540 INFO [train_multi_KD3.py:1140] (0/4) Computing validation loss 2024-08-20 05:54:02,467 INFO [train_multi_KD3.py:1150] (0/4) Epoch 32, validation on ASR_libri: loss=0.254, beats_loss=0, ecapa_loss=0.0005075, whisper_loss=0.2489, over 931116.00 frames. 2024-08-20 05:54:24,145 INFO [train_multi_KD3.py:1150] (0/4) Epoch 32, validation on SV_voxceleb1: loss=0.004011, beats_loss=0, ecapa_loss=0.0004011, whisper_loss=0, over 944235.00 frames. 2024-08-20 05:56:01,273 INFO [train_multi_KD3.py:1150] (0/4) Epoch 32, validation on AT_audioset: loss=0.02303, beats_loss=0.02303, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 05:56:01,276 INFO [train_multi_KD3.py:1156] (0/4) Maximum memory allocated so far is 32775MB 2024-08-20 05:56:09,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4683800.0, ans=0.125 2024-08-20 05:56:11,226 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 20 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-20 05:56:11,858 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2024-08-20 05:56:16,516 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 22 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 05:56:26,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4683900.0, ans=0.125 2024-08-20 05:56:36,617 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-20 05:57:24,000 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 9050, loss[loss=0.1072, beats_loss=0.01181, ecapa_loss=0.000115, whisper_loss=0.09421, over 17598.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001402, whisper_loss=0.09028, over 3765558.96 frames. ], batch size: 70, lr: 1.91e-03, grad_scale: 1.152921504606847e+18 2024-08-20 05:57:26,115 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2024-08-20 05:57:38,530 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.206e+01 2.470e+01 2.742e+01 4.296e+01, threshold=4.939e+01, percent-clipped=0.0 2024-08-20 05:57:50,355 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 05:57:52,703 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.95 vs. limit=22.5 2024-08-20 05:57:53,809 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-20 05:58:04,279 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 24 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-20 05:58:17,420 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 25 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-20 05:58:21,029 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2024-08-20 05:58:28,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4684700.0, ans=0.0 2024-08-20 05:58:38,028 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 8 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-20 05:58:45,889 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 9100, loss[loss=0.09504, beats_loss=0.01175, ecapa_loss=0.0001106, whisper_loss=0.08218, over 20920.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01046, ecapa_loss=0.0001401, whisper_loss=0.08955, over 3748256.34 frames. ], batch size: 81, lr: 1.91e-03, grad_scale: 1.152921504606847e+18 2024-08-20 05:58:48,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4684800.0, ans=0.125 2024-08-20 05:58:48,546 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.75 vs. limit=15.0 2024-08-20 05:59:01,966 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 21 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-20 05:59:03,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4684900.0, ans=0.5 2024-08-20 05:59:05,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4684900.0, ans=0.1 2024-08-20 05:59:18,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4685000.0, ans=0.125 2024-08-20 05:59:27,908 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 20 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-20 05:59:33,454 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 35 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 05:59:34,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4685100.0, ans=0.2 2024-08-20 05:59:35,214 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-08-20 05:59:36,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4685100.0, ans=0.0 2024-08-20 05:59:41,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4685100.0, ans=0.0 2024-08-20 06:00:10,210 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 9150, loss[loss=0.08021, beats_loss=0.01183, ecapa_loss=0.0001174, whisper_loss=0.0672, over 15606.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01051, ecapa_loss=0.0001394, whisper_loss=0.08987, over 3789530.22 frames. ], batch size: 63, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:00:22,675 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=15.0 2024-08-20 06:00:27,032 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.310e+01 2.550e+01 2.853e+01 1.227e+02, threshold=5.100e+01, percent-clipped=1.0 2024-08-20 06:00:43,131 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 21 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 06:00:49,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4685500.0, ans=0.125 2024-08-20 06:00:57,538 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.09 vs. limit=15.0 2024-08-20 06:01:02,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4685600.0, ans=0.1 2024-08-20 06:01:04,251 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=15.0 2024-08-20 06:01:11,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4685600.0, ans=0.125 2024-08-20 06:01:20,911 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.94 vs. limit=15.0 2024-08-20 06:01:35,567 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 9200, loss[loss=0.1226, beats_loss=0.008586, ecapa_loss=0.0001334, whisper_loss=0.1127, over 16879.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01048, ecapa_loss=0.0001392, whisper_loss=0.08939, over 3749541.14 frames. ], batch size: 64, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:01:39,240 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 36 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 06:01:43,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4685800.0, ans=0.0 2024-08-20 06:01:45,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4685800.0, ans=0.1 2024-08-20 06:01:56,902 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 21 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-20 06:02:19,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4686000.0, ans=0.125 2024-08-20 06:02:29,440 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.05 vs. limit=15.0 2024-08-20 06:02:37,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4686100.0, ans=0.125 2024-08-20 06:03:02,851 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 9250, loss[loss=0.06939, beats_loss=0.01131, ecapa_loss=0.0001306, whisper_loss=0.05678, over 15692.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01047, ecapa_loss=0.0001397, whisper_loss=0.08978, over 3763672.66 frames. ], batch size: 63, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:03:20,028 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.318e+01 2.500e+01 2.733e+01 3.571e+01, threshold=4.999e+01, percent-clipped=0.0 2024-08-20 06:03:52,471 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 20 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-20 06:04:25,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4686700.0, ans=0.0 2024-08-20 06:04:28,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4686700.0, ans=0.1 2024-08-20 06:04:31,235 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 9300, loss[loss=0.102, beats_loss=0.01037, ecapa_loss=0.000149, whisper_loss=0.09012, over 21831.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01041, ecapa_loss=0.0001408, whisper_loss=0.08966, over 3799777.45 frames. ], batch size: 93, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:04:31,586 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 19 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 06:04:47,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4686900.0, ans=0.1 2024-08-20 06:04:53,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4686900.0, ans=0.125 2024-08-20 06:04:53,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4686900.0, ans=0.1 2024-08-20 06:04:54,260 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 06:05:16,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4687000.0, ans=0.0 2024-08-20 06:05:34,008 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 27 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 06:05:47,557 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 31 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 06:06:08,689 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 9350, loss[loss=0.08503, beats_loss=0.01183, ecapa_loss=0.0001211, whisper_loss=0.07199, over 22886.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01044, ecapa_loss=0.0001401, whisper_loss=0.08959, over 3809758.29 frames. ], batch size: 93, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:06:25,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4687300.0, ans=0.125 2024-08-20 06:06:26,947 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 36 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-20 06:06:28,517 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.303e+01 2.586e+01 2.791e+01 3.756e+01, threshold=5.173e+01, percent-clipped=0.0 2024-08-20 06:06:35,392 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.19 vs. limit=10.0 2024-08-20 06:06:40,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4687400.0, ans=0.125 2024-08-20 06:06:46,111 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 24 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-20 06:06:47,989 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 25 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 06:07:04,322 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.29 vs. limit=22.5 2024-08-20 06:07:06,866 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 25 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-20 06:07:26,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4687700.0, ans=0.125 2024-08-20 06:07:28,688 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.401e+01 2024-08-20 06:07:31,666 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 17 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 06:07:31,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4687700.0, ans=0.125 2024-08-20 06:07:39,644 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 9400, loss[loss=0.09737, beats_loss=0.008893, ecapa_loss=0.000131, whisper_loss=0.08716, over 15238.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01046, ecapa_loss=0.0001397, whisper_loss=0.08949, over 3809245.84 frames. ], batch size: 56, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:08:19,176 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 24 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-20 06:08:24,500 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 19 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 06:08:24,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4688000.0, ans=0.1 2024-08-20 06:08:24,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4688000.0, ans=0.0 2024-08-20 06:08:25,316 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.53 vs. limit=12.0 2024-08-20 06:08:26,688 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 25 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 06:08:38,882 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 06:08:40,838 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-20 06:08:48,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4688100.0, ans=0.07 2024-08-20 06:09:00,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4688200.0, ans=0.125 2024-08-20 06:09:08,340 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 9450, loss[loss=0.08728, beats_loss=0.01296, ecapa_loss=9.709e-05, whisper_loss=0.07334, over 19064.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01043, ecapa_loss=0.0001407, whisper_loss=0.08919, over 3823798.59 frames. ], batch size: 74, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:09:19,659 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.99 vs. limit=15.0 2024-08-20 06:09:25,215 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-08-20 06:09:26,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4688400.0, ans=0.125 2024-08-20 06:09:27,829 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.376e+01 2.594e+01 2.934e+01 1.922e+02, threshold=5.189e+01, percent-clipped=1.0 2024-08-20 06:09:41,327 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 22 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-20 06:10:19,152 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 32 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-20 06:10:36,259 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 9500, loss[loss=0.1237, beats_loss=0.01002, ecapa_loss=0.0001277, whisper_loss=0.1124, over 24835.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01039, ecapa_loss=0.0001406, whisper_loss=0.08974, over 3834502.25 frames. ], batch size: 94, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:10:44,330 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.655e-03 2024-08-20 06:10:45,533 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 19 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-20 06:10:45,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4688800.0, ans=0.125 2024-08-20 06:10:56,131 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 24 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 06:11:13,025 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 25 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-20 06:11:25,052 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.72 vs. limit=15.0 2024-08-20 06:11:26,586 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 21 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-20 06:11:43,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4689100.0, ans=0.0 2024-08-20 06:12:03,403 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 9550, loss[loss=0.08672, beats_loss=0.01172, ecapa_loss=0.0001601, whisper_loss=0.0734, over 21198.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01034, ecapa_loss=0.0001409, whisper_loss=0.0896, over 3800872.16 frames. ], batch size: 92, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:12:21,198 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.271e+01 2.487e+01 2.797e+01 1.341e+02, threshold=4.974e+01, percent-clipped=1.0 2024-08-20 06:12:28,813 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 30 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-20 06:12:47,634 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2024-08-20 06:13:27,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4689700.0, ans=0.025 2024-08-20 06:13:31,493 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2024-08-20 06:13:32,315 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 9600, loss[loss=0.1103, beats_loss=0.008122, ecapa_loss=0.0001682, whisper_loss=0.1005, over 14565.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01034, ecapa_loss=0.0001409, whisper_loss=0.08962, over 3766825.67 frames. ], batch size: 57, lr: 1.91e-03, grad_scale: 5.764607523034235e+17 2024-08-20 06:13:53,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4689900.0, ans=0.1 2024-08-20 06:14:02,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4689900.0, ans=0.0 2024-08-20 06:14:31,763 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 06:14:56,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4690200.0, ans=0.1 2024-08-20 06:15:02,502 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-20 06:15:06,759 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 9650, loss[loss=0.08998, beats_loss=0.01077, ecapa_loss=0.0001107, whisper_loss=0.07811, over 15356.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0103, ecapa_loss=0.0001414, whisper_loss=0.08997, over 3787288.25 frames. ], batch size: 57, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:15:26,365 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.336e+01 2.779e+01 3.042e+01 4.169e+01, threshold=5.558e+01, percent-clipped=0.0 2024-08-20 06:15:26,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4690400.0, ans=0.0 2024-08-20 06:15:43,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4690500.0, ans=0.0 2024-08-20 06:15:45,724 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.98 vs. limit=15.0 2024-08-20 06:16:14,355 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 29 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-20 06:16:27,013 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.61 vs. limit=15.0 2024-08-20 06:16:32,797 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 9700, loss[loss=0.1144, beats_loss=0.01018, ecapa_loss=0.000147, whisper_loss=0.1027, over 14943.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0103, ecapa_loss=0.0001411, whisper_loss=0.08984, over 3767644.14 frames. ], batch size: 58, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:16:39,052 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 06:16:42,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=4690800.0, ans=0.025 2024-08-20 06:16:49,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4690900.0, ans=0.1 2024-08-20 06:17:18,600 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 28 from LS+wenet, 20 from Vox, 15 fro AS 2024-08-20 06:17:43,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4691200.0, ans=0.0 2024-08-20 06:17:45,063 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 25 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-20 06:17:48,304 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 28 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-20 06:17:54,436 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 9750, loss[loss=0.1084, beats_loss=0.009769, ecapa_loss=0.0001365, whisper_loss=0.0973, over 22462.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01033, ecapa_loss=0.000141, whisper_loss=0.0897, over 3776944.03 frames. ], batch size: 90, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:18:03,229 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 13 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-20 06:18:12,450 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.687e+01 2.245e+01 2.617e+01 2.841e+01 5.114e+01, threshold=5.235e+01, percent-clipped=0.0 2024-08-20 06:18:13,324 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.06 vs. limit=15.0 2024-08-20 06:18:15,875 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 38 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-20 06:18:34,281 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 06:18:54,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4691600.0, ans=0.125 2024-08-20 06:19:16,749 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 9800, loss[loss=0.0928, beats_loss=0.01055, ecapa_loss=0.0001185, whisper_loss=0.08106, over 19597.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01043, ecapa_loss=0.0001391, whisper_loss=0.08877, over 3797510.65 frames. ], batch size: 76, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:19:23,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4691800.0, ans=0.125 2024-08-20 06:19:30,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4691800.0, ans=0.0 2024-08-20 06:19:30,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4691800.0, ans=0.1 2024-08-20 06:19:39,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4691900.0, ans=0.0 2024-08-20 06:20:13,749 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 22 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-20 06:20:15,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4692100.0, ans=0.1 2024-08-20 06:20:20,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4692100.0, ans=0.0 2024-08-20 06:20:38,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4692300.0, ans=0.125 2024-08-20 06:20:39,623 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 9850, loss[loss=0.08742, beats_loss=0.008876, ecapa_loss=0.0001489, whisper_loss=0.07706, over 15684.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01043, ecapa_loss=0.0001396, whisper_loss=0.08922, over 3798208.18 frames. ], batch size: 63, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:20:44,490 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 31 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 06:20:58,001 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.366e+01 2.568e+01 2.856e+01 6.259e+01, threshold=5.136e+01, percent-clipped=2.0 2024-08-20 06:20:58,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4692400.0, ans=0.0 2024-08-20 06:21:08,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4692400.0, ans=0.1 2024-08-20 06:21:12,908 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 18 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 06:21:19,906 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 25 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-20 06:21:21,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4692500.0, ans=0.125 2024-08-20 06:21:22,126 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-08-20 06:21:23,952 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2024-08-20 06:21:55,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4692700.0, ans=0.125 2024-08-20 06:21:57,552 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.93 vs. limit=22.5 2024-08-20 06:22:02,849 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 9900, loss[loss=0.09, beats_loss=0.01186, ecapa_loss=0.0001499, whisper_loss=0.07665, over 21617.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01044, ecapa_loss=0.00014, whisper_loss=0.08933, over 3819714.48 frames. ], batch size: 91, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:22:13,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4692800.0, ans=0.1 2024-08-20 06:22:31,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4692900.0, ans=0.2 2024-08-20 06:22:50,885 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2024-08-20 06:22:52,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4693100.0, ans=0.125 2024-08-20 06:23:04,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4693100.0, ans=0.2 2024-08-20 06:23:24,910 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 9950, loss[loss=0.1037, beats_loss=0.008943, ecapa_loss=0.0001308, whisper_loss=0.09348, over 19762.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01046, ecapa_loss=0.000139, whisper_loss=0.08979, over 3810165.91 frames. ], batch size: 75, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:23:25,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4693300.0, ans=0.1 2024-08-20 06:23:42,407 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.759e+01 2.243e+01 2.443e+01 2.710e+01 3.765e+01, threshold=4.885e+01, percent-clipped=0.0 2024-08-20 06:24:06,211 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 25 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 06:24:09,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4693500.0, ans=0.05 2024-08-20 06:24:18,234 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 06:24:24,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4693600.0, ans=0.125 2024-08-20 06:24:40,038 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 22 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-20 06:24:47,372 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 23 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-20 06:24:49,106 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 10000, loss[loss=0.09647, beats_loss=0.01284, ecapa_loss=0.0001014, whisper_loss=0.08262, over 20599.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01051, ecapa_loss=0.0001386, whisper_loss=0.08948, over 3809802.07 frames. ], batch size: 79, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:24:57,557 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 21 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-20 06:25:21,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4693900.0, ans=0.125 2024-08-20 06:25:30,425 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.83 vs. limit=12.0 2024-08-20 06:25:39,593 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.89 vs. limit=15.0 2024-08-20 06:25:40,226 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 06:26:10,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4694200.0, ans=0.125 2024-08-20 06:26:19,062 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 10050, loss[loss=0.1355, beats_loss=0.007213, ecapa_loss=0.0001545, whisper_loss=0.1268, over 17979.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01049, ecapa_loss=0.0001379, whisper_loss=0.08982, over 3793698.40 frames. ], batch size: 68, lr: 1.91e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:26:37,061 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.304e+01 2.607e+01 2.920e+01 4.346e+01, threshold=5.214e+01, percent-clipped=0.0 2024-08-20 06:26:41,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=4694400.0, ans=0.2 2024-08-20 06:27:08,999 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2024-08-20 06:27:10,075 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-20 06:27:15,664 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=15.0 2024-08-20 06:27:21,043 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 25 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 06:27:38,505 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 24 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-20 06:27:38,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4694700.0, ans=0.125 2024-08-20 06:27:44,355 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 21 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-20 06:27:47,732 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 10100, loss[loss=0.09342, beats_loss=0.01038, ecapa_loss=0.0001352, whisper_loss=0.08169, over 16171.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01048, ecapa_loss=0.0001374, whisper_loss=0.08962, over 3775775.71 frames. ], batch size: 65, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:27:58,309 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 22 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-20 06:28:04,197 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.57 vs. limit=15.0 2024-08-20 06:28:06,071 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.36 vs. limit=15.0 2024-08-20 06:28:17,230 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 36 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-20 06:28:27,833 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-20 06:28:29,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4695000.0, ans=0.0 2024-08-20 06:28:33,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4695000.0, ans=0.1 2024-08-20 06:28:33,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4695000.0, ans=0.0 2024-08-20 06:29:01,499 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.08 vs. limit=22.5 2024-08-20 06:29:22,547 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 10150, loss[loss=0.1064, beats_loss=0.01175, ecapa_loss=0.0001228, whisper_loss=0.09344, over 23247.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01049, ecapa_loss=0.0001389, whisper_loss=0.08994, over 3806187.30 frames. ], batch size: 91, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:29:26,618 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 31 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-20 06:29:26,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4695300.0, ans=0.2 2024-08-20 06:29:31,102 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.38 vs. limit=22.5 2024-08-20 06:29:37,306 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 13 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 06:29:44,563 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.187e+01 2.409e+01 2.808e+01 3.836e+01, threshold=4.818e+01, percent-clipped=0.0 2024-08-20 06:29:49,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4695400.0, ans=0.125 2024-08-20 06:30:03,922 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 31 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 06:30:15,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4695500.0, ans=0.0 2024-08-20 06:30:21,415 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 30 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-20 06:30:23,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4695600.0, ans=0.1 2024-08-20 06:30:23,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4695600.0, ans=0.1 2024-08-20 06:30:26,125 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 34 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-20 06:30:34,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4695600.0, ans=0.1 2024-08-20 06:30:42,160 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 17 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-20 06:30:58,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4695700.0, ans=0.1 2024-08-20 06:31:02,386 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 10200, loss[loss=0.1067, beats_loss=0.009315, ecapa_loss=0.0001469, whisper_loss=0.09591, over 20718.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01053, ecapa_loss=0.0001385, whisper_loss=0.08943, over 3771800.12 frames. ], batch size: 81, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:31:52,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4696000.0, ans=0.125 2024-08-20 06:32:25,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4696200.0, ans=0.1 2024-08-20 06:32:25,536 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.57 vs. limit=15.0 2024-08-20 06:32:39,475 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 10250, loss[loss=0.07197, beats_loss=0.01305, ecapa_loss=0.0001203, whisper_loss=0.05772, over 15130.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01049, ecapa_loss=0.0001389, whisper_loss=0.08978, over 3783302.26 frames. ], batch size: 60, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:32:41,687 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 24 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-20 06:32:52,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4696300.0, ans=0.125 2024-08-20 06:32:55,746 WARNING [optim.py:496] (0/4) Scaling gradients by 0.07058558613061905, model_norm_threshold=48.17802047729492 2024-08-20 06:32:55,902 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.542e+04, grad_sumsq=7.542e+04, orig_rms_sq=1.000e+00 2024-08-20 06:33:01,503 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.230e+01 2.493e+01 2.759e+01 6.825e+02, threshold=4.986e+01, percent-clipped=2.0 2024-08-20 06:33:10,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4696400.0, ans=0.125 2024-08-20 06:33:13,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4696400.0, ans=0.05 2024-08-20 06:33:22,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4696500.0, ans=0.0 2024-08-20 06:33:41,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4696600.0, ans=0.1 2024-08-20 06:33:48,355 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.84 vs. limit=22.5 2024-08-20 06:34:02,429 WARNING [optim.py:496] (0/4) Scaling gradients by 0.07055973261594772, model_norm_threshold=49.85801315307617 2024-08-20 06:34:02,583 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.714e+04, grad_sumsq=4.714e+04, orig_rms_sq=1.000e+00 2024-08-20 06:34:22,326 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 10300, loss[loss=0.1025, beats_loss=0.007235, ecapa_loss=0.000148, whisper_loss=0.09383, over 12480.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01042, ecapa_loss=0.0001393, whisper_loss=0.0902, over 3787308.85 frames. ], batch size: 49, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:34:26,367 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 24 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-20 06:34:26,850 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.39 vs. limit=10.0 2024-08-20 06:35:23,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4697100.0, ans=0.125 2024-08-20 06:35:58,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4697200.0, ans=0.0 2024-08-20 06:36:00,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4697200.0, ans=0.0 2024-08-20 06:36:04,309 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 10350, loss[loss=0.1009, beats_loss=0.01065, ecapa_loss=0.0001379, whisper_loss=0.08885, over 18698.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01046, ecapa_loss=0.0001403, whisper_loss=0.0896, over 3770620.76 frames. ], batch size: 75, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:36:08,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4697300.0, ans=0.1 2024-08-20 06:36:27,481 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.261e+01 2.488e+01 2.810e+01 7.066e+02, threshold=4.977e+01, percent-clipped=3.0 2024-08-20 06:36:31,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4697400.0, ans=0.125 2024-08-20 06:37:17,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4697600.0, ans=0.125 2024-08-20 06:37:25,085 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-20 06:37:41,839 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 16 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 06:37:47,020 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 10400, loss[loss=0.09059, beats_loss=0.01355, ecapa_loss=0.0001061, whisper_loss=0.07597, over 22451.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01034, ecapa_loss=0.0001405, whisper_loss=0.09071, over 3776898.79 frames. ], batch size: 88, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:38:05,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4697800.0, ans=0.025 2024-08-20 06:38:06,758 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 21 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 06:38:07,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4697900.0, ans=0.125 2024-08-20 06:38:15,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4697900.0, ans=0.125 2024-08-20 06:38:39,878 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 14 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-20 06:38:40,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4698000.0, ans=0.125 2024-08-20 06:38:45,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4698000.0, ans=0.125 2024-08-20 06:39:04,720 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.62 vs. limit=22.5 2024-08-20 06:39:06,982 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 26 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-20 06:39:18,855 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 16 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 06:39:20,831 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 14 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-20 06:39:22,650 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 15 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 06:39:31,458 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 10450, loss[loss=0.1033, beats_loss=0.01077, ecapa_loss=9.004e-05, whisper_loss=0.09168, over 13950.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01032, ecapa_loss=0.0001407, whisper_loss=0.09021, over 3746346.58 frames. ], batch size: 52, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:39:39,403 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-08-20 06:39:42,520 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.93 vs. limit=15.0 2024-08-20 06:39:53,389 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.263e+01 2.457e+01 2.777e+01 9.468e+01, threshold=4.915e+01, percent-clipped=2.0 2024-08-20 06:40:00,757 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.53 vs. limit=15.0 2024-08-20 06:40:09,411 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.25 vs. limit=15.0 2024-08-20 06:40:27,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4698500.0, ans=0.125 2024-08-20 06:40:28,781 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 12 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 06:40:47,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4698600.0, ans=0.0 2024-08-20 06:41:00,655 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.97 vs. limit=15.0 2024-08-20 06:41:09,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4698800.0, ans=0.125 2024-08-20 06:41:09,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4698800.0, ans=0.125 2024-08-20 06:41:11,105 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 10500, loss[loss=0.1022, beats_loss=0.008102, ecapa_loss=0.0001948, whisper_loss=0.09214, over 15993.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01033, ecapa_loss=0.0001404, whisper_loss=0.09043, over 3746445.14 frames. ], batch size: 64, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:41:13,381 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 28 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-20 06:41:13,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4698800.0, ans=0.125 2024-08-20 06:41:20,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4698800.0, ans=0.0 2024-08-20 06:41:21,843 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 25 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-20 06:41:22,495 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.44 vs. limit=12.0 2024-08-20 06:41:25,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4698800.0, ans=0.125 2024-08-20 06:41:45,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4698900.0, ans=0.125 2024-08-20 06:41:56,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4699000.0, ans=0.0 2024-08-20 06:42:05,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4699000.0, ans=0.0 2024-08-20 06:42:24,633 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 34 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-20 06:42:24,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4699100.0, ans=0.125 2024-08-20 06:42:24,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4699100.0, ans=0.1 2024-08-20 06:42:26,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4699100.0, ans=0.125 2024-08-20 06:42:28,411 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 21 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 06:42:35,189 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 22 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-20 06:42:54,630 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 10550, loss[loss=0.1172, beats_loss=0.009808, ecapa_loss=0.0001723, whisper_loss=0.1057, over 19108.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01034, ecapa_loss=0.000141, whisper_loss=0.08996, over 3776549.17 frames. ], batch size: 78, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:42:57,199 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.48 vs. limit=22.5 2024-08-20 06:43:17,676 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.308e+01 2.564e+01 2.826e+01 3.881e+01, threshold=5.129e+01, percent-clipped=0.0 2024-08-20 06:43:30,359 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 27 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-20 06:43:34,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4699500.0, ans=0.1 2024-08-20 06:43:51,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4699500.0, ans=0.2 2024-08-20 06:44:23,129 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 16 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-20 06:44:38,382 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 10600, loss[loss=0.09195, beats_loss=0.01125, ecapa_loss=0.0001193, whisper_loss=0.07951, over 19177.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01033, ecapa_loss=0.000141, whisper_loss=0.08983, over 3762214.92 frames. ], batch size: 75, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:44:38,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4699800.0, ans=0.125 2024-08-20 06:44:40,877 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-20 06:45:24,871 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 22 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-20 06:45:38,418 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.06 vs. limit=12.0 2024-08-20 06:46:01,562 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.10 vs. limit=12.0 2024-08-20 06:46:07,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4700200.0, ans=0.0 2024-08-20 06:46:17,370 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 17 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 06:46:24,867 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 10650, loss[loss=0.1045, beats_loss=0.01089, ecapa_loss=0.0001198, whisper_loss=0.09241, over 23594.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01032, ecapa_loss=0.000141, whisper_loss=0.09047, over 3756135.75 frames. ], batch size: 92, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:46:29,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4700300.0, ans=0.1 2024-08-20 06:46:39,298 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 23 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-20 06:46:41,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4700300.0, ans=0.1 2024-08-20 06:46:45,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4700400.0, ans=0.125 2024-08-20 06:46:46,905 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.694e+01 2.297e+01 2.515e+01 2.879e+01 5.897e+01, threshold=5.029e+01, percent-clipped=1.0 2024-08-20 06:46:47,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4700400.0, ans=0.125 2024-08-20 06:46:50,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4700400.0, ans=0.125 2024-08-20 06:46:54,752 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 27 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 06:46:58,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4700400.0, ans=0.125 2024-08-20 06:47:07,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=4700500.0, ans=15.0 2024-08-20 06:47:13,689 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.93 vs. limit=22.5 2024-08-20 06:47:34,941 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 19 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 06:47:50,109 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 38 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-20 06:48:01,424 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 06:48:09,991 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 10700, loss[loss=0.09993, beats_loss=0.01121, ecapa_loss=0.0001209, whisper_loss=0.08751, over 16887.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01038, ecapa_loss=0.000141, whisper_loss=0.09038, over 3755417.71 frames. ], batch size: 68, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:48:13,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4700800.0, ans=0.1 2024-08-20 06:48:30,069 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 32 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 06:48:33,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4700900.0, ans=0.125 2024-08-20 06:48:39,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4700900.0, ans=0.125 2024-08-20 06:48:52,432 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.88 vs. limit=15.0 2024-08-20 06:49:02,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4701000.0, ans=0.125 2024-08-20 06:49:09,274 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 24 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-20 06:49:13,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4701100.0, ans=0.0 2024-08-20 06:49:15,438 WARNING [optim.py:496] (0/4) Scaling gradients by 0.023652782663702965, model_norm_threshold=50.29466247558594 2024-08-20 06:49:15,592 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.37, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.689e+06, grad_sumsq=1.581e+08, orig_rms_sq=1.068e-02 2024-08-20 06:49:50,248 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 10750, loss[loss=0.07518, beats_loss=0.01125, ecapa_loss=0.0001412, whisper_loss=0.06252, over 15214.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0104, ecapa_loss=0.0001407, whisper_loss=0.09011, over 3769854.40 frames. ], batch size: 64, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:49:58,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4701300.0, ans=0.5 2024-08-20 06:50:10,419 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-20 06:50:11,680 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.335e+01 2.524e+01 2.835e+01 2.126e+03, threshold=5.048e+01, percent-clipped=3.0 2024-08-20 06:50:24,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4701400.0, ans=0.125 2024-08-20 06:51:16,695 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2024-08-20 06:51:26,254 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 21 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-20 06:51:26,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4701700.0, ans=0.05 2024-08-20 06:51:29,633 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 10800, loss[loss=0.1192, beats_loss=0.009918, ecapa_loss=0.0001387, whisper_loss=0.1079, over 23174.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01037, ecapa_loss=0.000141, whisper_loss=0.09025, over 3772499.21 frames. ], batch size: 91, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:51:30,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4701800.0, ans=0.0 2024-08-20 06:51:37,978 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 19 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 06:51:55,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4701900.0, ans=0.125 2024-08-20 06:52:05,870 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 06:52:24,521 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2024-08-20 06:52:50,395 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 26 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-20 06:53:04,310 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.11 vs. limit=10.0 2024-08-20 06:53:08,925 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 10850, loss[loss=0.08503, beats_loss=0.01233, ecapa_loss=0.0001317, whisper_loss=0.07138, over 22044.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01042, ecapa_loss=0.0001394, whisper_loss=0.0898, over 3790446.60 frames. ], batch size: 91, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:53:11,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4702300.0, ans=0.2 2024-08-20 06:53:30,958 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.288e+01 2.456e+01 2.756e+01 3.873e+01, threshold=4.912e+01, percent-clipped=0.0 2024-08-20 06:53:41,617 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=15.0 2024-08-20 06:54:27,124 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 20 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-20 06:54:39,395 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 17 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-20 06:54:48,397 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 10900, loss[loss=0.1129, beats_loss=0.009142, ecapa_loss=0.0001314, whisper_loss=0.1024, over 15301.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01037, ecapa_loss=0.0001401, whisper_loss=0.08989, over 3766431.90 frames. ], batch size: 59, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:54:52,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4702800.0, ans=0.125 2024-08-20 06:55:31,081 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 27 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-20 06:55:35,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4703000.0, ans=0.125 2024-08-20 06:55:47,701 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 19 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-20 06:55:55,282 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 06:56:20,697 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 23 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-20 06:56:25,272 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 10950, loss[loss=0.1313, beats_loss=0.008203, ecapa_loss=0.0001641, whisper_loss=0.1215, over 18990.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01029, ecapa_loss=0.000141, whisper_loss=0.09038, over 3747924.44 frames. ], batch size: 74, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:56:47,787 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.264e+01 2.421e+01 2.646e+01 4.130e+01, threshold=4.843e+01, percent-clipped=0.0 2024-08-20 06:56:57,325 WARNING [optim.py:496] (0/4) Scaling gradients by 0.04336608201265335, model_norm_threshold=48.42934799194336 2024-08-20 06:56:57,480 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.187e+05, grad_sumsq=2.051e+07, orig_rms_sq=1.067e-02 2024-08-20 06:57:14,879 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 06:57:48,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4703700.0, ans=0.125 2024-08-20 06:57:56,653 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 11000, loss[loss=0.09746, beats_loss=0.01205, ecapa_loss=0.000144, whisper_loss=0.08397, over 23074.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01042, ecapa_loss=0.0001424, whisper_loss=0.09017, over 3755605.93 frames. ], batch size: 91, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:58:03,353 WARNING [optim.py:496] (0/4) Scaling gradients by 0.06608612835407257, model_norm_threshold=48.42934799194336 2024-08-20 06:58:03,512 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.920e+04, grad_sumsq=8.920e+04, orig_rms_sq=1.000e+00 2024-08-20 06:58:08,708 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-20 06:58:35,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4704000.0, ans=0.125 2024-08-20 06:58:36,088 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=22.5 2024-08-20 06:58:43,625 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 21 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-20 06:58:57,065 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 23 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-20 06:58:59,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4704100.0, ans=0.1 2024-08-20 06:59:03,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4704100.0, ans=0.125 2024-08-20 06:59:08,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4704200.0, ans=0.015 2024-08-20 06:59:23,034 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 11050, loss[loss=0.1115, beats_loss=0.007634, ecapa_loss=0.0002268, whisper_loss=0.1016, over 16121.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01042, ecapa_loss=0.000142, whisper_loss=0.09028, over 3788909.97 frames. ], batch size: 71, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 06:59:36,101 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 19 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-20 06:59:43,208 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.346e+01 2.551e+01 2.950e+01 1.117e+03, threshold=5.103e+01, percent-clipped=5.0 2024-08-20 06:59:52,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4704400.0, ans=0.1 2024-08-20 07:00:03,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4704500.0, ans=0.125 2024-08-20 07:00:57,175 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 11100, loss[loss=0.1337, beats_loss=0.006883, ecapa_loss=0.0001526, whisper_loss=0.1253, over 23242.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01044, ecapa_loss=0.0001412, whisper_loss=0.09023, over 3800865.47 frames. ], batch size: 89, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:01:07,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4704800.0, ans=0.125 2024-08-20 07:01:30,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4704900.0, ans=0.1 2024-08-20 07:01:38,529 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 22 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-20 07:02:16,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4705200.0, ans=0.1 2024-08-20 07:02:25,587 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 17 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 07:02:27,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4705200.0, ans=0.0 2024-08-20 07:02:35,149 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 11150, loss[loss=0.1019, beats_loss=0.01065, ecapa_loss=0.0001511, whisper_loss=0.08971, over 22263.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01039, ecapa_loss=0.0001416, whisper_loss=0.0906, over 3834326.46 frames. ], batch size: 92, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:02:59,645 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.354e+01 2.488e+01 2.886e+01 1.211e+02, threshold=4.976e+01, percent-clipped=2.0 2024-08-20 07:03:13,285 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-20 07:03:23,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4705500.0, ans=0.125 2024-08-20 07:03:25,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4705500.0, ans=0.1 2024-08-20 07:03:38,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4705600.0, ans=0.125 2024-08-20 07:04:01,764 WARNING [optim.py:496] (0/4) Scaling gradients by 0.09045316278934479, model_norm_threshold=49.755611419677734 2024-08-20 07:04:01,920 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.0.norm.log_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.284e+04, grad_sumsq=3.284e+04, orig_rms_sq=1.000e+00 2024-08-20 07:04:12,274 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.63 vs. limit=15.0 2024-08-20 07:04:21,369 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 11200, loss[loss=0.09455, beats_loss=0.01254, ecapa_loss=0.0001228, whisper_loss=0.08078, over 23277.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01039, ecapa_loss=0.0001408, whisper_loss=0.09081, over 3838855.44 frames. ], batch size: 93, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:04:27,646 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.02 vs. limit=12.0 2024-08-20 07:04:41,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4705900.0, ans=0.125 2024-08-20 07:05:10,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4706000.0, ans=0.125 2024-08-20 07:05:13,915 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 26 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-20 07:05:17,582 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 26 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-20 07:05:19,731 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.98 vs. limit=6.0 2024-08-20 07:05:34,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4706100.0, ans=0.0 2024-08-20 07:05:51,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4706200.0, ans=0.125 2024-08-20 07:05:54,874 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 07:05:59,613 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.80 vs. limit=12.0 2024-08-20 07:06:00,559 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 11250, loss[loss=0.09947, beats_loss=0.00752, ecapa_loss=0.0001753, whisper_loss=0.09019, over 15940.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01038, ecapa_loss=0.0001405, whisper_loss=0.09098, over 3862023.80 frames. ], batch size: 63, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:06:01,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4706300.0, ans=0.2 2024-08-20 07:06:18,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4706400.0, ans=0.025 2024-08-20 07:06:23,080 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.303e+01 2.565e+01 2.928e+01 5.501e+02, threshold=5.130e+01, percent-clipped=1.0 2024-08-20 07:06:25,205 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 31 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-20 07:06:30,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4706400.0, ans=0.125 2024-08-20 07:07:06,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4706600.0, ans=0.125 2024-08-20 07:07:13,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4706600.0, ans=0.2 2024-08-20 07:07:15,552 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 29 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-20 07:07:23,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4706700.0, ans=0.0 2024-08-20 07:07:40,630 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 11300, loss[loss=0.09213, beats_loss=0.01161, ecapa_loss=0.0001465, whisper_loss=0.07905, over 22232.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01036, ecapa_loss=0.0001404, whisper_loss=0.09056, over 3847054.78 frames. ], batch size: 92, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:08:03,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4706900.0, ans=0.125 2024-08-20 07:08:06,120 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.15 vs. limit=22.5 2024-08-20 07:08:23,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4707000.0, ans=0.0 2024-08-20 07:09:05,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4707200.0, ans=0.125 2024-08-20 07:09:06,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4707200.0, ans=0.125 2024-08-20 07:09:08,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4707200.0, ans=0.125 2024-08-20 07:09:16,178 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 11350, loss[loss=0.1158, beats_loss=0.00918, ecapa_loss=0.0001332, whisper_loss=0.1052, over 20742.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01033, ecapa_loss=0.0001404, whisper_loss=0.09077, over 3803533.16 frames. ], batch size: 78, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:09:36,765 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.698e+01 2.220e+01 2.467e+01 2.786e+01 5.186e+01, threshold=4.935e+01, percent-clipped=1.0 2024-08-20 07:09:41,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4707400.0, ans=0.1 2024-08-20 07:09:47,419 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=12.0 2024-08-20 07:09:51,061 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.90 vs. limit=6.0 2024-08-20 07:09:56,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4707500.0, ans=0.125 2024-08-20 07:10:09,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4707500.0, ans=0.2 2024-08-20 07:10:17,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4707600.0, ans=0.125 2024-08-20 07:10:20,522 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 24 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-20 07:10:45,846 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 29 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-20 07:10:49,667 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 11400, loss[loss=0.09214, beats_loss=0.01096, ecapa_loss=7.767e-05, whisper_loss=0.0804, over 17813.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01028, ecapa_loss=0.0001401, whisper_loss=0.09069, over 3801769.26 frames. ], batch size: 63, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:11:17,778 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 35 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-20 07:11:47,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4708100.0, ans=0.2 2024-08-20 07:11:53,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4708100.0, ans=0.0 2024-08-20 07:11:58,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=4708100.0, ans=0.1 2024-08-20 07:12:02,899 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 26 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-20 07:12:06,820 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 24 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-20 07:12:07,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4708200.0, ans=0.125 2024-08-20 07:12:23,557 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 11450, loss[loss=0.1007, beats_loss=0.007963, ecapa_loss=0.0001705, whisper_loss=0.09106, over 13219.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01027, ecapa_loss=0.0001407, whisper_loss=0.09029, over 3765018.63 frames. ], batch size: 52, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:12:45,495 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.698e+01 2.318e+01 2.506e+01 2.910e+01 4.315e+01, threshold=5.012e+01, percent-clipped=0.0 2024-08-20 07:12:54,127 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2024-08-20 07:13:08,229 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 23 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-20 07:13:12,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4708500.0, ans=0.125 2024-08-20 07:13:16,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4708500.0, ans=0.1 2024-08-20 07:13:20,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4708600.0, ans=0.0 2024-08-20 07:13:24,242 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2024-08-20 07:13:29,091 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-20 07:13:36,455 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 28 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 07:13:47,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4708700.0, ans=0.125 2024-08-20 07:13:58,851 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 11500, loss[loss=0.1103, beats_loss=0.0103, ecapa_loss=0.0001452, whisper_loss=0.09852, over 22160.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01028, ecapa_loss=0.0001417, whisper_loss=0.08994, over 3774144.31 frames. ], batch size: 90, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:14:03,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4708800.0, ans=0.125 2024-08-20 07:14:26,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4708900.0, ans=0.125 2024-08-20 07:14:35,409 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 13 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-20 07:14:45,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4709000.0, ans=0.2 2024-08-20 07:14:52,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4709100.0, ans=0.125 2024-08-20 07:14:56,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4709100.0, ans=0.125 2024-08-20 07:15:09,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4709100.0, ans=0.09899494936611666 2024-08-20 07:15:39,197 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 11550, loss[loss=0.1085, beats_loss=0.009077, ecapa_loss=0.0001644, whisper_loss=0.09775, over 18105.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01026, ecapa_loss=0.0001414, whisper_loss=0.08987, over 3750962.69 frames. ], batch size: 73, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:16:01,887 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.654e+01 2.302e+01 2.543e+01 2.812e+01 2.319e+02, threshold=5.086e+01, percent-clipped=3.0 2024-08-20 07:16:18,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4709500.0, ans=0.125 2024-08-20 07:16:32,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4709500.0, ans=0.0 2024-08-20 07:17:07,258 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 23 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 07:17:25,994 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 11600, loss[loss=0.103, beats_loss=0.01162, ecapa_loss=0.000135, whisper_loss=0.09, over 18341.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01026, ecapa_loss=0.0001413, whisper_loss=0.0899, over 3778465.41 frames. ], batch size: 74, lr: 1.90e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 07:17:36,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4709800.0, ans=0.125 2024-08-20 07:17:49,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4709900.0, ans=0.0 2024-08-20 07:17:57,786 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.12 vs. limit=15.0 2024-08-20 07:18:02,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4709900.0, ans=0.1 2024-08-20 07:18:21,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4710000.0, ans=0.0 2024-08-20 07:18:26,097 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 34 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-20 07:18:26,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4710100.0, ans=0.0 2024-08-20 07:19:09,675 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 11650, loss[loss=0.1115, beats_loss=0.01013, ecapa_loss=0.0001256, whisper_loss=0.1001, over 18914.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01031, ecapa_loss=0.0001409, whisper_loss=0.08948, over 3786656.77 frames. ], batch size: 72, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:19:10,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4710300.0, ans=0.07 2024-08-20 07:19:17,318 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.817e+00 2024-08-20 07:19:29,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4710400.0, ans=0.0 2024-08-20 07:19:29,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4710400.0, ans=0.0 2024-08-20 07:19:33,344 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.347e+01 2.610e+01 2.991e+01 4.037e+01, threshold=5.219e+01, percent-clipped=0.0 2024-08-20 07:19:34,086 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.85 vs. limit=6.0 2024-08-20 07:19:43,954 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-20 07:20:51,477 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 11700, loss[loss=0.1076, beats_loss=0.009908, ecapa_loss=0.0001535, whisper_loss=0.09621, over 21558.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01027, ecapa_loss=0.0001416, whisper_loss=0.08966, over 3781749.64 frames. ], batch size: 90, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:20:52,431 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.82 vs. limit=15.0 2024-08-20 07:21:22,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4710900.0, ans=0.125 2024-08-20 07:21:28,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4710900.0, ans=0.1 2024-08-20 07:21:33,577 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 26 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 07:21:45,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4711000.0, ans=0.0 2024-08-20 07:21:49,089 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 31 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 07:21:53,581 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 30 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-20 07:21:53,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4711100.0, ans=0.125 2024-08-20 07:21:55,732 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.94 vs. limit=10.0 2024-08-20 07:22:18,254 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=15.0 2024-08-20 07:22:22,865 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.16 vs. limit=22.5 2024-08-20 07:22:31,998 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 11750, loss[loss=0.09269, beats_loss=0.01298, ecapa_loss=0.0001066, whisper_loss=0.07864, over 23195.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01027, ecapa_loss=0.0001398, whisper_loss=0.09055, over 3824676.70 frames. ], batch size: 92, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:22:55,006 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.706e+01 2.306e+01 2.569e+01 2.913e+01 4.079e+01, threshold=5.137e+01, percent-clipped=0.0 2024-08-20 07:22:56,615 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 20 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-20 07:23:27,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4711500.0, ans=0.125 2024-08-20 07:23:56,364 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.87 vs. limit=22.5 2024-08-20 07:24:18,344 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 11800, loss[loss=0.1118, beats_loss=0.01041, ecapa_loss=0.0001407, whisper_loss=0.1, over 19910.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01026, ecapa_loss=0.0001392, whisper_loss=0.09071, over 3802620.42 frames. ], batch size: 78, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:24:23,836 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.06 vs. limit=15.0 2024-08-20 07:24:35,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4711800.0, ans=0.125 2024-08-20 07:24:58,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4712000.0, ans=0.125 2024-08-20 07:25:01,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4712000.0, ans=0.125 2024-08-20 07:25:05,318 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 29 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-20 07:25:18,547 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.56 vs. limit=22.5 2024-08-20 07:25:45,479 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 15 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-20 07:26:02,561 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 11850, loss[loss=0.09106, beats_loss=0.01104, ecapa_loss=0.0001375, whisper_loss=0.07865, over 21411.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01037, ecapa_loss=0.0001395, whisper_loss=0.08965, over 3837311.38 frames. ], batch size: 83, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:26:03,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4712300.0, ans=0.2 2024-08-20 07:26:03,418 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=17.11 vs. limit=15.0 2024-08-20 07:26:26,631 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.335e+01 2.520e+01 2.883e+01 3.441e+02, threshold=5.040e+01, percent-clipped=1.0 2024-08-20 07:26:29,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4712400.0, ans=0.95 2024-08-20 07:26:51,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4712500.0, ans=0.125 2024-08-20 07:26:51,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4712500.0, ans=0.0 2024-08-20 07:26:55,557 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 21 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-20 07:27:10,069 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 07:27:14,624 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 12 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 07:27:16,867 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.39 vs. limit=22.5 2024-08-20 07:27:46,196 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 11900, loss[loss=0.0975, beats_loss=0.01065, ecapa_loss=0.0001387, whisper_loss=0.08547, over 20627.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01044, ecapa_loss=0.0001401, whisper_loss=0.08971, over 3841432.95 frames. ], batch size: 81, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:27:46,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4712800.0, ans=0.0 2024-08-20 07:28:14,798 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.87 vs. limit=15.0 2024-08-20 07:28:27,355 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.98 vs. limit=15.0 2024-08-20 07:28:30,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4713000.0, ans=0.1 2024-08-20 07:28:36,259 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 07:28:44,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4713100.0, ans=0.0 2024-08-20 07:28:50,530 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.11 vs. limit=15.0 2024-08-20 07:28:53,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4713100.0, ans=0.125 2024-08-20 07:28:57,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4713100.0, ans=0.2 2024-08-20 07:29:15,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4713200.0, ans=0.125 2024-08-20 07:29:23,020 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 11950, loss[loss=0.1146, beats_loss=0.009433, ecapa_loss=0.0001708, whisper_loss=0.1034, over 21929.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.0001411, whisper_loss=0.0903, over 3845404.10 frames. ], batch size: 91, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:29:33,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4713300.0, ans=0.1 2024-08-20 07:29:43,869 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.385e+01 2.581e+01 2.869e+01 3.753e+01, threshold=5.162e+01, percent-clipped=0.0 2024-08-20 07:29:46,451 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.431e+00 2024-08-20 07:29:48,026 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 16 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-20 07:29:53,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4713400.0, ans=0.125 2024-08-20 07:30:15,776 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 23 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-20 07:30:31,496 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.30 vs. limit=10.0 2024-08-20 07:30:40,483 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 07:30:44,037 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 07:30:49,782 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.90 vs. limit=15.0 2024-08-20 07:30:57,721 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 12000, loss[loss=0.08148, beats_loss=0.009786, ecapa_loss=0.0001405, whisper_loss=0.07029, over 13062.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01055, ecapa_loss=0.000139, whisper_loss=0.09058, over 3838922.92 frames. ], batch size: 52, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:30:57,722 INFO [train_multi_KD3.py:1140] (0/4) Computing validation loss 2024-08-20 07:31:33,834 INFO [train_multi_KD3.py:1150] (0/4) Epoch 32, validation on ASR_libri: loss=0.2532, beats_loss=0, ecapa_loss=0.0005087, whisper_loss=0.2481, over 931116.00 frames. 2024-08-20 07:31:50,467 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.9311, 3.6380, 4.6753, 4.5695], device='cuda:0') 2024-08-20 07:31:56,098 INFO [train_multi_KD3.py:1150] (0/4) Epoch 32, validation on SV_voxceleb1: loss=0.003908, beats_loss=0, ecapa_loss=0.0003908, whisper_loss=0, over 944235.00 frames. 2024-08-20 07:33:09,193 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.9058, 1.7889, 1.8350, 1.2947, 1.6242, 2.0521, 2.3982, 1.5815], device='cuda:0') 2024-08-20 07:33:38,228 INFO [train_multi_KD3.py:1150] (0/4) Epoch 32, validation on AT_audioset: loss=0.02304, beats_loss=0.02304, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 07:33:38,232 INFO [train_multi_KD3.py:1156] (0/4) Maximum memory allocated so far is 32775MB 2024-08-20 07:33:43,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4713800.0, ans=0.125 2024-08-20 07:33:51,760 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.29 vs. limit=15.0 2024-08-20 07:35:04,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4714300.0, ans=0.125 2024-08-20 07:35:06,503 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 12050, loss[loss=0.09871, beats_loss=0.01226, ecapa_loss=0.0001352, whisper_loss=0.0851, over 22232.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01054, ecapa_loss=0.0001389, whisper_loss=0.09043, over 3839692.73 frames. ], batch size: 92, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:35:06,681 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 07:35:13,349 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 07:35:16,124 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 24 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-20 07:35:25,717 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.706e+01 2.124e+01 2.460e+01 2.784e+01 4.386e+01, threshold=4.920e+01, percent-clipped=0.0 2024-08-20 07:35:55,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4714500.0, ans=0.1 2024-08-20 07:36:18,295 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 24 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-20 07:36:19,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4714700.0, ans=0.2 2024-08-20 07:36:23,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4714700.0, ans=0.125 2024-08-20 07:36:39,070 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 12100, loss[loss=0.09621, beats_loss=0.01159, ecapa_loss=0.0001502, whisper_loss=0.08312, over 12648.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01053, ecapa_loss=0.0001392, whisper_loss=0.09079, over 3826037.67 frames. ], batch size: 51, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:37:14,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4714900.0, ans=0.2 2024-08-20 07:37:19,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4715000.0, ans=0.125 2024-08-20 07:37:24,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4715000.0, ans=0.125 2024-08-20 07:37:26,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4715000.0, ans=0.125 2024-08-20 07:37:34,620 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 16 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-20 07:37:39,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4715000.0, ans=0.1 2024-08-20 07:37:44,056 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 34 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-20 07:38:23,891 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 12150, loss[loss=0.133, beats_loss=0.007415, ecapa_loss=0.0001546, whisper_loss=0.124, over 22960.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0105, ecapa_loss=0.0001401, whisper_loss=0.09028, over 3807166.12 frames. ], batch size: 91, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:38:26,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4715300.0, ans=0.125 2024-08-20 07:38:30,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4715300.0, ans=0.1 2024-08-20 07:38:34,672 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 20 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-20 07:38:34,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4715300.0, ans=0.125 2024-08-20 07:38:47,679 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.297e+01 2.567e+01 2.958e+01 5.999e+01, threshold=5.133e+01, percent-clipped=2.0 2024-08-20 07:38:58,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4715400.0, ans=0.2 2024-08-20 07:39:22,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4715600.0, ans=0.125 2024-08-20 07:39:22,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4715600.0, ans=0.125 2024-08-20 07:39:22,323 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.67 vs. limit=15.0 2024-08-20 07:39:30,851 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-20 07:39:58,933 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 12200, loss[loss=0.1035, beats_loss=0.01006, ecapa_loss=0.0001146, whisper_loss=0.0923, over 14209.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01045, ecapa_loss=0.0001404, whisper_loss=0.09069, over 3781563.92 frames. ], batch size: 53, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:40:10,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4715800.0, ans=0.125 2024-08-20 07:40:11,647 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 07:40:13,607 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 20 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 07:40:14,968 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 16 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-20 07:40:24,284 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.187e+05 2024-08-20 07:40:50,232 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2024-08-20 07:40:53,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4716100.0, ans=0.125 2024-08-20 07:40:55,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4716100.0, ans=0.07 2024-08-20 07:41:11,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4716200.0, ans=0.125 2024-08-20 07:41:21,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4716200.0, ans=0.2 2024-08-20 07:41:25,575 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 12250, loss[loss=0.1046, beats_loss=0.00992, ecapa_loss=0.0001613, whisper_loss=0.09306, over 20159.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01043, ecapa_loss=0.0001398, whisper_loss=0.09055, over 3779732.95 frames. ], batch size: 83, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:41:28,373 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.07 vs. limit=22.5 2024-08-20 07:41:36,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4716300.0, ans=0.0 2024-08-20 07:41:38,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4716300.0, ans=0.0 2024-08-20 07:41:45,065 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.278e+01 2.601e+01 2.929e+01 4.392e+01, threshold=5.202e+01, percent-clipped=0.0 2024-08-20 07:41:54,145 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 19 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 07:41:59,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4716500.0, ans=0.1 2024-08-20 07:42:04,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4716500.0, ans=0.05 2024-08-20 07:42:05,615 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 13 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-20 07:42:15,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4716500.0, ans=0.0 2024-08-20 07:42:19,418 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2024-08-20 07:42:33,337 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 29 from LS+wenet, 13 from Vox, 48 fro AS 2024-08-20 07:42:38,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4716700.0, ans=0.125 2024-08-20 07:42:53,999 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 12300, loss[loss=0.08463, beats_loss=0.008795, ecapa_loss=0.0001261, whisper_loss=0.07457, over 17437.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01046, ecapa_loss=0.0001385, whisper_loss=0.08991, over 3795498.78 frames. ], batch size: 65, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:42:55,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4716800.0, ans=0.0 2024-08-20 07:43:09,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4716900.0, ans=0.0 2024-08-20 07:43:11,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4716900.0, ans=0.125 2024-08-20 07:43:15,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4716900.0, ans=0.0 2024-08-20 07:43:24,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4716900.0, ans=0.125 2024-08-20 07:43:30,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4717000.0, ans=0.125 2024-08-20 07:43:33,984 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.93 vs. limit=10.0 2024-08-20 07:43:35,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4717000.0, ans=0.125 2024-08-20 07:43:40,394 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 22 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-20 07:43:54,070 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 19 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 07:43:54,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4717100.0, ans=0.1 2024-08-20 07:43:55,765 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 32 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-20 07:43:59,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4717100.0, ans=0.0 2024-08-20 07:44:15,540 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 19 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 07:44:23,195 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 12350, loss[loss=0.1242, beats_loss=0.007688, ecapa_loss=0.0001489, whisper_loss=0.115, over 14006.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01051, ecapa_loss=0.0001386, whisper_loss=0.08989, over 3807971.49 frames. ], batch size: 53, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:44:44,249 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.339e+01 2.566e+01 2.988e+01 1.086e+02, threshold=5.133e+01, percent-clipped=1.0 2024-08-20 07:44:58,595 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.66 vs. limit=10.0 2024-08-20 07:45:28,338 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 33 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 07:45:44,193 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 21 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-20 07:45:46,602 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 21 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-20 07:45:55,823 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 12400, loss[loss=0.1129, beats_loss=0.01013, ecapa_loss=0.0001441, whisper_loss=0.1013, over 22370.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01049, ecapa_loss=0.000139, whisper_loss=0.08971, over 3797511.85 frames. ], batch size: 88, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:46:17,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4717900.0, ans=0.125 2024-08-20 07:46:18,484 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 22 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-20 07:46:30,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4718000.0, ans=0.95 2024-08-20 07:46:37,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=4718000.0, ans=0.95 2024-08-20 07:46:44,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4718000.0, ans=0.0 2024-08-20 07:47:07,337 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 12 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-20 07:47:09,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4718200.0, ans=0.0 2024-08-20 07:47:23,163 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 23 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-20 07:47:24,630 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 12450, loss[loss=0.1005, beats_loss=0.009713, ecapa_loss=0.0001291, whisper_loss=0.08946, over 17247.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01049, ecapa_loss=0.0001382, whisper_loss=0.0897, over 3814773.56 frames. ], batch size: 67, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:47:29,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4718300.0, ans=0.0 2024-08-20 07:47:33,134 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 15 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 07:47:43,886 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.285e+01 2.465e+01 2.724e+01 4.543e+01, threshold=4.931e+01, percent-clipped=0.0 2024-08-20 07:47:44,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4718400.0, ans=0.125 2024-08-20 07:47:45,712 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 18 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-20 07:47:49,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4718400.0, ans=0.125 2024-08-20 07:47:59,296 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 19 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-20 07:48:30,980 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-20 07:48:46,002 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 25 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-20 07:48:58,435 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 12500, loss[loss=0.09725, beats_loss=0.01261, ecapa_loss=0.0001277, whisper_loss=0.08336, over 22755.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01044, ecapa_loss=0.0001388, whisper_loss=0.08965, over 3780289.12 frames. ], batch size: 95, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:49:42,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4719000.0, ans=0.0 2024-08-20 07:49:44,235 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-20 07:50:00,160 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 21 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-20 07:50:14,424 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 07:50:25,341 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 24 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 07:50:28,507 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 12550, loss[loss=0.08873, beats_loss=0.0136, ecapa_loss=0.0001315, whisper_loss=0.07382, over 23003.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01039, ecapa_loss=0.0001387, whisper_loss=0.09015, over 3835566.06 frames. ], batch size: 92, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:50:30,932 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-20 07:50:47,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4719400.0, ans=0.0 2024-08-20 07:50:48,467 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.246e+01 2.517e+01 2.953e+01 4.531e+01, threshold=5.033e+01, percent-clipped=0.0 2024-08-20 07:50:48,715 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 38 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-20 07:50:58,922 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.48 vs. limit=15.0 2024-08-20 07:51:18,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4719500.0, ans=0.0 2024-08-20 07:51:39,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4719700.0, ans=0.125 2024-08-20 07:51:39,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4719700.0, ans=0.2 2024-08-20 07:51:59,659 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 12600, loss[loss=0.09014, beats_loss=0.01006, ecapa_loss=0.0001604, whisper_loss=0.07848, over 14495.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01037, ecapa_loss=0.0001395, whisper_loss=0.09062, over 3842385.43 frames. ], batch size: 62, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:52:15,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4719800.0, ans=0.125 2024-08-20 07:52:17,286 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 17 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-20 07:52:23,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4719900.0, ans=0.0 2024-08-20 07:52:34,805 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-472000.pt 2024-08-20 07:52:38,505 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 34 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 07:52:40,253 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 35 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-20 07:53:01,384 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.42 vs. limit=22.5 2024-08-20 07:53:07,135 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 26 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-20 07:53:08,630 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 24 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 07:53:33,457 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 12650, loss[loss=0.07494, beats_loss=0.01316, ecapa_loss=0.0001361, whisper_loss=0.06042, over 19794.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01027, ecapa_loss=0.0001399, whisper_loss=0.09089, over 3819698.44 frames. ], batch size: 83, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:53:51,920 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 17 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 07:53:52,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4720400.0, ans=0.2 2024-08-20 07:53:52,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4720400.0, ans=0.0 2024-08-20 07:53:53,477 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.384e+01 2.676e+01 2.977e+01 1.190e+02, threshold=5.353e+01, percent-clipped=5.0 2024-08-20 07:53:59,874 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.83 vs. limit=22.5 2024-08-20 07:54:04,569 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 32 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-20 07:54:09,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4720500.0, ans=0.1 2024-08-20 07:54:13,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4720500.0, ans=0.1 2024-08-20 07:54:20,918 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 25 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-20 07:54:30,030 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 11 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 07:54:30,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4720600.0, ans=0.125 2024-08-20 07:54:41,007 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.61 vs. limit=12.0 2024-08-20 07:54:57,808 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 15 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 07:55:02,703 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 12700, loss[loss=0.1072, beats_loss=0.01186, ecapa_loss=0.0001032, whisper_loss=0.09428, over 19952.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01038, ecapa_loss=0.0001396, whisper_loss=0.09031, over 3832322.03 frames. ], batch size: 76, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:55:15,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4720800.0, ans=0.1 2024-08-20 07:55:21,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4720900.0, ans=0.125 2024-08-20 07:55:52,181 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.28 vs. limit=22.5 2024-08-20 07:56:32,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4721300.0, ans=0.125 2024-08-20 07:56:34,115 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 12750, loss[loss=0.07972, beats_loss=0.01257, ecapa_loss=0.0001471, whisper_loss=0.06568, over 15617.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01048, ecapa_loss=0.0001398, whisper_loss=0.08902, over 3797542.90 frames. ], batch size: 63, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:56:45,120 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.90 vs. limit=15.0 2024-08-20 07:56:52,947 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.290e+01 2.492e+01 2.698e+01 4.644e+01, threshold=4.984e+01, percent-clipped=0.0 2024-08-20 07:56:55,483 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2024-08-20 07:57:02,183 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 18 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-20 07:57:24,766 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.78 vs. limit=12.0 2024-08-20 07:57:52,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4721700.0, ans=0.125 2024-08-20 07:58:02,309 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 12800, loss[loss=0.1149, beats_loss=0.007883, ecapa_loss=0.0001408, whisper_loss=0.1056, over 21426.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01042, ecapa_loss=0.0001398, whisper_loss=0.08966, over 3804258.24 frames. ], batch size: 85, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:58:13,724 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.720e-01 2024-08-20 07:58:19,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4721900.0, ans=0.0 2024-08-20 07:58:24,112 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 25 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-20 07:59:36,530 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 12850, loss[loss=0.08961, beats_loss=0.01059, ecapa_loss=0.0001752, whisper_loss=0.07726, over 21590.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01038, ecapa_loss=0.0001396, whisper_loss=0.09034, over 3814973.73 frames. ], batch size: 94, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 07:59:54,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=4722400.0, ans=10.0 2024-08-20 07:59:56,945 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.213e+01 2.468e+01 2.785e+01 8.515e+01, threshold=4.935e+01, percent-clipped=2.0 2024-08-20 08:00:13,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4722500.0, ans=0.2 2024-08-20 08:00:37,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4722600.0, ans=0.2 2024-08-20 08:00:42,500 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 23 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-20 08:00:46,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4722700.0, ans=0.0 2024-08-20 08:01:04,943 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 12900, loss[loss=0.08661, beats_loss=0.01092, ecapa_loss=0.0001395, whisper_loss=0.07429, over 18654.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01044, ecapa_loss=0.0001394, whisper_loss=0.09049, over 3792214.61 frames. ], batch size: 74, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:01:21,618 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 19 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 08:01:25,364 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 08:01:54,567 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.58 vs. limit=15.0 2024-08-20 08:02:01,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4723100.0, ans=0.125 2024-08-20 08:02:05,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4723100.0, ans=0.125 2024-08-20 08:02:29,979 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 13 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-20 08:02:32,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4723200.0, ans=0.1 2024-08-20 08:02:35,022 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 12950, loss[loss=0.1058, beats_loss=0.01054, ecapa_loss=0.0001504, whisper_loss=0.09373, over 16546.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01055, ecapa_loss=0.0001383, whisper_loss=0.09006, over 3785984.70 frames. ], batch size: 70, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:02:47,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4723300.0, ans=0.0 2024-08-20 08:02:56,158 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.320e+01 2.461e+01 2.898e+01 1.890e+02, threshold=4.922e+01, percent-clipped=4.0 2024-08-20 08:03:06,275 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.11 vs. limit=15.0 2024-08-20 08:03:18,618 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-20 08:03:32,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4723600.0, ans=0.09899494936611666 2024-08-20 08:03:36,550 WARNING [optim.py:496] (0/4) Scaling gradients by 0.012246229685842991, model_norm_threshold=49.221561431884766 2024-08-20 08:03:36,707 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.214e+06, grad_sumsq=3.009e+08, orig_rms_sq=1.068e-02 2024-08-20 08:03:42,969 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.64 vs. limit=15.0 2024-08-20 08:03:47,704 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 27 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-20 08:04:05,895 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 22 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-20 08:04:06,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4723800.0, ans=0.125 2024-08-20 08:04:07,920 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 13000, loss[loss=0.09272, beats_loss=0.0129, ecapa_loss=0.0001388, whisper_loss=0.07844, over 19264.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0106, ecapa_loss=0.0001379, whisper_loss=0.09012, over 3822758.54 frames. ], batch size: 80, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:04:08,090 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 21 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 08:04:45,932 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 29 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-20 08:04:48,666 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.23 vs. limit=12.0 2024-08-20 08:04:53,515 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 24 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-20 08:04:57,760 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 21 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-20 08:05:02,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4724100.0, ans=0.125 2024-08-20 08:05:04,419 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 32 from LS+wenet, 31 from Vox, 24 fro AS 2024-08-20 08:05:11,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4724100.0, ans=0.125 2024-08-20 08:05:16,344 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.81 vs. limit=10.0 2024-08-20 08:05:32,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4724200.0, ans=0.5 2024-08-20 08:05:41,805 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 13050, loss[loss=0.1077, beats_loss=0.007928, ecapa_loss=0.0001835, whisper_loss=0.09797, over 15074.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01058, ecapa_loss=0.0001392, whisper_loss=0.09024, over 3800109.60 frames. ], batch size: 64, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:05:44,110 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 19 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-20 08:05:45,789 WARNING [optim.py:496] (0/4) Scaling gradients by 0.03136618062853813, model_norm_threshold=49.221561431884766 2024-08-20 08:05:45,949 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.19, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.784e+05, grad_sumsq=1.450e+05, orig_rms_sq=3.300e+00 2024-08-20 08:05:47,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4724300.0, ans=0.125 2024-08-20 08:05:57,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=4724300.0, ans=0.025 2024-08-20 08:06:03,430 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.318e+01 2.513e+01 2.841e+01 4.019e+03, threshold=5.026e+01, percent-clipped=3.0 2024-08-20 08:06:12,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4724400.0, ans=0.125 2024-08-20 08:06:29,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4724500.0, ans=0.1 2024-08-20 08:06:39,027 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 18 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-20 08:06:52,131 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 16 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 08:07:12,193 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 26 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 08:07:17,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4724700.0, ans=0.125 2024-08-20 08:07:21,151 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 13100, loss[loss=0.1114, beats_loss=0.009646, ecapa_loss=0.0001498, whisper_loss=0.1003, over 23166.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01059, ecapa_loss=0.0001395, whisper_loss=0.08966, over 3773197.12 frames. ], batch size: 92, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:07:24,422 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 26 from LS+wenet, 33 from Vox, 33 fro AS 2024-08-20 08:07:35,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4724800.0, ans=0.0 2024-08-20 08:07:46,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4724900.0, ans=0.125 2024-08-20 08:07:50,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4724900.0, ans=0.0 2024-08-20 08:08:26,739 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.17 vs. limit=6.0 2024-08-20 08:08:27,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4725100.0, ans=0.2 2024-08-20 08:08:29,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4725100.0, ans=0.0 2024-08-20 08:08:49,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4725200.0, ans=0.0 2024-08-20 08:08:49,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4725200.0, ans=0.2 2024-08-20 08:08:54,638 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 13150, loss[loss=0.1049, beats_loss=0.009661, ecapa_loss=0.0001642, whisper_loss=0.09363, over 17874.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01061, ecapa_loss=0.0001402, whisper_loss=0.08875, over 3742557.79 frames. ], batch size: 76, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:08:55,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4725300.0, ans=0.125 2024-08-20 08:09:10,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4725300.0, ans=0.0 2024-08-20 08:09:16,313 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.495e+01 2.265e+01 2.500e+01 2.860e+01 8.543e+01, threshold=5.000e+01, percent-clipped=2.0 2024-08-20 08:09:34,402 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 26 from LS+wenet, 19 from Vox, 13 fro AS 2024-08-20 08:09:34,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4725500.0, ans=0.0 2024-08-20 08:09:36,466 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 08:09:38,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4725500.0, ans=0.04949747468305833 2024-08-20 08:09:38,374 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-08-20 08:09:38,503 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.67 vs. limit=15.0 2024-08-20 08:09:41,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4725500.0, ans=0.035 2024-08-20 08:10:06,314 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 18 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-20 08:10:17,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4725700.0, ans=0.0 2024-08-20 08:10:31,883 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 13200, loss[loss=0.1298, beats_loss=0.008897, ecapa_loss=0.0001313, whisper_loss=0.1195, over 15791.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01055, ecapa_loss=0.0001403, whisper_loss=0.08869, over 3735571.95 frames. ], batch size: 59, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:10:39,366 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 21 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-20 08:10:39,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4725800.0, ans=0.1 2024-08-20 08:10:41,045 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 16 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 08:10:46,799 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 25 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-20 08:11:05,507 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 20 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-20 08:11:07,467 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 32 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 08:11:41,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4726100.0, ans=0.125 2024-08-20 08:12:00,810 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 28 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-20 08:12:05,809 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 13250, loss[loss=0.09768, beats_loss=0.01079, ecapa_loss=0.0001546, whisper_loss=0.08534, over 22651.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01052, ecapa_loss=0.0001405, whisper_loss=0.08901, over 3770912.34 frames. ], batch size: 93, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:12:15,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4726300.0, ans=0.2 2024-08-20 08:12:19,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4726300.0, ans=0.125 2024-08-20 08:12:22,864 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 19 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-20 08:12:24,683 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 28 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 08:12:26,220 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.372e+01 2.599e+01 3.015e+01 7.004e+01, threshold=5.197e+01, percent-clipped=3.0 2024-08-20 08:12:57,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4726500.0, ans=0.125 2024-08-20 08:13:00,246 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.75 vs. limit=5.0 2024-08-20 08:13:05,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4726600.0, ans=0.125 2024-08-20 08:13:40,488 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 13300, loss[loss=0.1059, beats_loss=0.009296, ecapa_loss=0.0001403, whisper_loss=0.09515, over 23711.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01049, ecapa_loss=0.0001402, whisper_loss=0.08911, over 3788105.24 frames. ], batch size: 93, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:13:41,267 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.18 vs. limit=15.0 2024-08-20 08:13:55,162 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2024-08-20 08:14:01,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4726900.0, ans=0.125 2024-08-20 08:14:13,223 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 08:14:13,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4726900.0, ans=0.0 2024-08-20 08:14:17,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4727000.0, ans=0.0 2024-08-20 08:14:26,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4727000.0, ans=0.0 2024-08-20 08:14:43,271 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 18 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-20 08:15:05,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4727200.0, ans=0.025 2024-08-20 08:15:14,032 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 13350, loss[loss=0.1028, beats_loss=0.0117, ecapa_loss=0.0001211, whisper_loss=0.08986, over 20352.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01054, ecapa_loss=0.0001396, whisper_loss=0.08907, over 3804487.47 frames. ], batch size: 82, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:15:34,090 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.328e+01 2.528e+01 2.746e+01 3.166e+02, threshold=5.056e+01, percent-clipped=2.0 2024-08-20 08:15:47,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4727400.0, ans=0.125 2024-08-20 08:15:53,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4727500.0, ans=0.0 2024-08-20 08:16:05,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4727500.0, ans=0.0 2024-08-20 08:16:18,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4727600.0, ans=0.05 2024-08-20 08:16:46,248 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 13400, loss[loss=0.06976, beats_loss=0.009876, ecapa_loss=0.0001202, whisper_loss=0.05868, over 15520.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01045, ecapa_loss=0.0001394, whisper_loss=0.08889, over 3763269.75 frames. ], batch size: 61, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:17:17,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4727900.0, ans=0.5 2024-08-20 08:17:43,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4728100.0, ans=0.0 2024-08-20 08:17:50,156 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 17 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 08:17:52,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4728100.0, ans=0.0 2024-08-20 08:18:10,471 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 27 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-20 08:18:17,770 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 13450, loss[loss=0.09667, beats_loss=0.01124, ecapa_loss=0.0001282, whisper_loss=0.08415, over 15036.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01045, ecapa_loss=0.0001391, whisper_loss=0.08838, over 3739270.08 frames. ], batch size: 60, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:18:20,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4728300.0, ans=0.1 2024-08-20 08:18:39,119 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+01 2.294e+01 2.576e+01 2.808e+01 3.727e+01, threshold=5.153e+01, percent-clipped=0.0 2024-08-20 08:18:57,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4728500.0, ans=0.09899494936611666 2024-08-20 08:19:22,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4728600.0, ans=0.125 2024-08-20 08:19:27,748 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-20 08:19:29,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4728600.0, ans=0.1 2024-08-20 08:19:46,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4728700.0, ans=0.0 2024-08-20 08:19:51,638 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 13500, loss[loss=0.1118, beats_loss=0.01037, ecapa_loss=0.0001191, whisper_loss=0.1003, over 23413.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01044, ecapa_loss=0.0001405, whisper_loss=0.08918, over 3780446.24 frames. ], batch size: 91, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:19:53,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4728800.0, ans=0.0 2024-08-20 08:19:54,272 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.61 vs. limit=10.0 2024-08-20 08:20:26,408 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.89 vs. limit=15.0 2024-08-20 08:20:33,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4729000.0, ans=0.0 2024-08-20 08:20:37,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4729000.0, ans=0.125 2024-08-20 08:20:58,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4729100.0, ans=0.2 2024-08-20 08:21:02,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4729100.0, ans=0.04949747468305833 2024-08-20 08:21:25,056 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 29 from LS+wenet, 32 from Vox, 29 fro AS 2024-08-20 08:21:31,021 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 13550, loss[loss=0.07867, beats_loss=0.01254, ecapa_loss=9.587e-05, whisper_loss=0.06518, over 17039.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01044, ecapa_loss=0.0001395, whisper_loss=0.08933, over 3788198.51 frames. ], batch size: 65, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:21:52,017 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.686e+01 2.276e+01 2.469e+01 2.814e+01 4.223e+01, threshold=4.938e+01, percent-clipped=0.0 2024-08-20 08:21:53,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4729400.0, ans=0.025 2024-08-20 08:21:54,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4729400.0, ans=0.125 2024-08-20 08:22:01,743 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 25 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 08:22:09,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4729500.0, ans=0.1 2024-08-20 08:22:16,863 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 27 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-20 08:22:29,104 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 34 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-20 08:22:37,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4729600.0, ans=0.1 2024-08-20 08:23:05,601 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 13600, loss[loss=0.1017, beats_loss=0.01154, ecapa_loss=0.0001025, whisper_loss=0.08909, over 15038.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01053, ecapa_loss=0.0001389, whisper_loss=0.089, over 3786842.84 frames. ], batch size: 58, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:23:22,644 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 26 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-20 08:24:43,163 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 13650, loss[loss=0.1092, beats_loss=0.009143, ecapa_loss=0.0001251, whisper_loss=0.0988, over 15919.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01053, ecapa_loss=0.0001391, whisper_loss=0.0887, over 3770816.56 frames. ], batch size: 61, lr: 1.90e-03, grad_scale: 1.152921504606847e+18 2024-08-20 08:24:58,938 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 16 from LS+wenet, 7 from Vox, 32 fro AS 2024-08-20 08:25:00,651 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 22 from LS+wenet, 22 from Vox, 51 fro AS 2024-08-20 08:25:03,841 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.366e+01 2.588e+01 2.806e+01 4.523e+02, threshold=5.175e+01, percent-clipped=1.0 2024-08-20 08:25:09,648 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-20 08:25:28,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4730500.0, ans=0.0 2024-08-20 08:25:34,705 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 20 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-20 08:25:43,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4730600.0, ans=0.125 2024-08-20 08:25:51,253 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 35 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-20 08:26:15,096 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 13700, loss[loss=0.1094, beats_loss=0.01078, ecapa_loss=0.0001361, whisper_loss=0.09726, over 17390.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01049, ecapa_loss=0.0001407, whisper_loss=0.08869, over 3762931.54 frames. ], batch size: 70, lr: 1.90e-03, grad_scale: 1.152921504606847e+18 2024-08-20 08:26:16,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4730800.0, ans=0.125 2024-08-20 08:26:21,792 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-20 08:26:26,076 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.41 vs. limit=15.0 2024-08-20 08:26:39,497 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.82 vs. limit=15.0 2024-08-20 08:26:43,232 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.37 vs. limit=22.5 2024-08-20 08:26:49,457 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 26 from LS+wenet, 12 from Vox, 18 fro AS 2024-08-20 08:27:18,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4731100.0, ans=0.125 2024-08-20 08:27:19,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4731100.0, ans=0.125 2024-08-20 08:27:23,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4731100.0, ans=0.125 2024-08-20 08:27:29,502 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 08:27:29,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4731200.0, ans=0.125 2024-08-20 08:27:35,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4731200.0, ans=0.1 2024-08-20 08:27:39,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4731200.0, ans=0.0 2024-08-20 08:27:43,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4731200.0, ans=0.0 2024-08-20 08:27:47,650 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 22 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-20 08:27:48,697 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 13750, loss[loss=0.09421, beats_loss=0.01007, ecapa_loss=0.0001298, whisper_loss=0.08284, over 18990.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01047, ecapa_loss=0.0001402, whisper_loss=0.08881, over 3805283.85 frames. ], batch size: 76, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:28:10,305 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.242e+01 2.511e+01 2.832e+01 4.850e+01, threshold=5.022e+01, percent-clipped=0.0 2024-08-20 08:28:10,559 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 27 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-20 08:28:32,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4731500.0, ans=0.0 2024-08-20 08:28:35,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4731500.0, ans=0.125 2024-08-20 08:28:45,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4731600.0, ans=0.0 2024-08-20 08:29:00,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4731700.0, ans=0.1 2024-08-20 08:29:22,248 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 13800, loss[loss=0.08843, beats_loss=0.01378, ecapa_loss=0.000126, whisper_loss=0.07339, over 22266.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01046, ecapa_loss=0.0001412, whisper_loss=0.08924, over 3791955.82 frames. ], batch size: 93, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:29:38,775 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 25 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-20 08:29:46,152 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 13 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-20 08:29:46,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4731900.0, ans=0.125 2024-08-20 08:29:59,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4732000.0, ans=0.125 2024-08-20 08:30:00,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4732000.0, ans=0.2 2024-08-20 08:30:11,014 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.62 vs. limit=10.0 2024-08-20 08:30:19,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4732100.0, ans=0.125 2024-08-20 08:30:28,346 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.39 vs. limit=15.0 2024-08-20 08:30:33,182 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 27 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 08:30:38,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4732200.0, ans=0.1 2024-08-20 08:30:54,833 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 13850, loss[loss=0.0999, beats_loss=0.01083, ecapa_loss=0.0001308, whisper_loss=0.08777, over 22004.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01048, ecapa_loss=0.0001409, whisper_loss=0.08973, over 3786062.55 frames. ], batch size: 86, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:31:02,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4732300.0, ans=0.0 2024-08-20 08:31:02,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4732300.0, ans=0.1 2024-08-20 08:31:02,964 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=15.0 2024-08-20 08:31:08,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4732300.0, ans=0.1 2024-08-20 08:31:14,740 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.72 vs. limit=15.0 2024-08-20 08:31:15,855 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.631e+01 2.257e+01 2.391e+01 2.623e+01 3.979e+01, threshold=4.782e+01, percent-clipped=0.0 2024-08-20 08:31:23,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4732400.0, ans=0.0 2024-08-20 08:31:25,301 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 18 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-20 08:31:36,125 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 24 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-20 08:31:38,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4732500.0, ans=0.07 2024-08-20 08:31:43,056 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.57 vs. limit=22.5 2024-08-20 08:32:02,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4732600.0, ans=0.0 2024-08-20 08:32:13,283 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.49 vs. limit=12.0 2024-08-20 08:32:23,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4732800.0, ans=0.1 2024-08-20 08:32:26,081 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 13900, loss[loss=0.1045, beats_loss=0.01062, ecapa_loss=0.000134, whisper_loss=0.09258, over 17403.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0105, ecapa_loss=0.0001406, whisper_loss=0.08977, over 3794454.39 frames. ], batch size: 68, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:33:19,756 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 30 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-20 08:33:34,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4733100.0, ans=0.0 2024-08-20 08:33:55,665 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 13950, loss[loss=0.1267, beats_loss=0.008173, ecapa_loss=0.0001598, whisper_loss=0.117, over 15461.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01052, ecapa_loss=0.0001409, whisper_loss=0.08954, over 3773947.57 frames. ], batch size: 60, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:34:01,926 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 25 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-20 08:34:05,477 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-20 08:34:16,080 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 25 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-20 08:34:17,739 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.254e+01 2.490e+01 2.803e+01 3.566e+01, threshold=4.981e+01, percent-clipped=0.0 2024-08-20 08:34:32,467 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.67 vs. limit=22.5 2024-08-20 08:34:37,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4733500.0, ans=0.125 2024-08-20 08:34:46,809 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.49 vs. limit=12.0 2024-08-20 08:34:51,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4733600.0, ans=0.1 2024-08-20 08:35:00,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=4733600.0, ans=10.0 2024-08-20 08:35:09,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4733700.0, ans=0.2 2024-08-20 08:35:09,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4733700.0, ans=0.125 2024-08-20 08:35:16,809 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.25 vs. limit=10.0 2024-08-20 08:35:26,932 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 14000, loss[loss=0.1324, beats_loss=0.00772, ecapa_loss=0.000114, whisper_loss=0.1235, over 19935.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01048, ecapa_loss=0.0001415, whisper_loss=0.08969, over 3788173.46 frames. ], batch size: 73, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:35:35,035 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.92 vs. limit=6.0 2024-08-20 08:35:38,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4733800.0, ans=0.125 2024-08-20 08:35:46,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4733900.0, ans=0.1 2024-08-20 08:35:59,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4733900.0, ans=0.1 2024-08-20 08:36:06,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4734000.0, ans=0.2 2024-08-20 08:36:23,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4734100.0, ans=0.1 2024-08-20 08:36:30,720 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.84 vs. limit=22.5 2024-08-20 08:36:33,735 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 26 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-20 08:37:00,704 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 14050, loss[loss=0.08422, beats_loss=0.01255, ecapa_loss=0.0001403, whisper_loss=0.07026, over 18637.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01039, ecapa_loss=0.0001411, whisper_loss=0.09094, over 3811220.18 frames. ], batch size: 78, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:37:22,479 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.283e+01 2.494e+01 2.922e+01 5.594e+01, threshold=4.987e+01, percent-clipped=2.0 2024-08-20 08:37:30,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4734400.0, ans=0.1 2024-08-20 08:37:32,466 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-20 08:37:37,226 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 23 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-20 08:37:43,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4734500.0, ans=0.07 2024-08-20 08:37:44,368 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.25 vs. limit=15.0 2024-08-20 08:37:45,993 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.34 vs. limit=10.0 2024-08-20 08:37:55,881 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.80 vs. limit=22.5 2024-08-20 08:37:56,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4734600.0, ans=0.125 2024-08-20 08:38:13,962 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 13 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 08:38:31,622 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 14100, loss[loss=0.09107, beats_loss=0.01054, ecapa_loss=0.0001298, whisper_loss=0.07924, over 20506.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01037, ecapa_loss=0.0001408, whisper_loss=0.09051, over 3823725.34 frames. ], batch size: 82, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:38:32,447 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.68 vs. limit=10.0 2024-08-20 08:38:33,566 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 17 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-20 08:38:45,806 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=12.0 2024-08-20 08:38:57,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4734900.0, ans=0.1 2024-08-20 08:39:20,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4735000.0, ans=0.1 2024-08-20 08:39:54,933 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 23 from LS+wenet, 32 from Vox, 31 fro AS 2024-08-20 08:40:01,882 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 14150, loss[loss=0.1076, beats_loss=0.009775, ecapa_loss=0.0001576, whisper_loss=0.09625, over 22499.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01031, ecapa_loss=0.0001404, whisper_loss=0.09147, over 3855944.92 frames. ], batch size: 92, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:40:02,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4735300.0, ans=0.0 2024-08-20 08:40:03,995 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 24 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-20 08:40:15,576 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.26 vs. limit=12.0 2024-08-20 08:40:20,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4735400.0, ans=0.1 2024-08-20 08:40:22,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4735400.0, ans=0.125 2024-08-20 08:40:22,946 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.235e+01 2.465e+01 2.825e+01 7.434e+01, threshold=4.929e+01, percent-clipped=1.0 2024-08-20 08:40:28,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4735400.0, ans=0.125 2024-08-20 08:40:31,935 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 23 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-20 08:40:34,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4735400.0, ans=0.125 2024-08-20 08:40:41,814 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 25 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-20 08:40:47,453 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.379e+01 2024-08-20 08:40:48,805 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 29 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-20 08:40:52,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4735500.0, ans=0.2 2024-08-20 08:41:02,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4735600.0, ans=0.0 2024-08-20 08:41:06,711 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.55 vs. limit=10.0 2024-08-20 08:41:30,424 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 14200, loss[loss=0.08431, beats_loss=0.01132, ecapa_loss=0.0001431, whisper_loss=0.07156, over 15087.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01032, ecapa_loss=0.0001408, whisper_loss=0.09158, over 3861279.61 frames. ], batch size: 61, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:41:35,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4735800.0, ans=0.125 2024-08-20 08:42:15,607 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 20 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-20 08:42:40,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4736100.0, ans=0.125 2024-08-20 08:42:48,732 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 20 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-20 08:42:48,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4736200.0, ans=0.2 2024-08-20 08:42:55,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4736200.0, ans=0.0 2024-08-20 08:43:00,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=4736300.0, ans=0.95 2024-08-20 08:43:02,144 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 14250, loss[loss=0.1151, beats_loss=0.008842, ecapa_loss=0.0001689, whisper_loss=0.1046, over 23052.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01032, ecapa_loss=0.0001403, whisper_loss=0.09156, over 3846494.67 frames. ], batch size: 92, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:43:24,165 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.313e+01 2.520e+01 2.754e+01 4.470e+01, threshold=5.041e+01, percent-clipped=0.0 2024-08-20 08:43:30,272 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 19 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 08:43:33,072 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.43 vs. limit=10.0 2024-08-20 08:43:52,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4736500.0, ans=0.125 2024-08-20 08:44:07,619 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 08:44:16,582 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.23 vs. limit=10.0 2024-08-20 08:44:18,050 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 24 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-20 08:44:18,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4736700.0, ans=0.0 2024-08-20 08:44:25,325 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-20 08:44:35,088 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 14300, loss[loss=0.0787, beats_loss=0.01002, ecapa_loss=0.0001492, whisper_loss=0.06719, over 15768.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01035, ecapa_loss=0.0001397, whisper_loss=0.09133, over 3856389.28 frames. ], batch size: 64, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:44:42,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4736800.0, ans=0.2 2024-08-20 08:45:24,454 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 19 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-20 08:45:27,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4737000.0, ans=0.2 2024-08-20 08:45:43,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4737100.0, ans=0.1 2024-08-20 08:45:46,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4737200.0, ans=0.125 2024-08-20 08:46:04,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4737300.0, ans=0.0 2024-08-20 08:46:05,494 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 14350, loss[loss=0.1112, beats_loss=0.01165, ecapa_loss=0.0001091, whisper_loss=0.09842, over 22698.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01033, ecapa_loss=0.0001394, whisper_loss=0.09165, over 3859567.36 frames. ], batch size: 90, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:46:23,436 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 08:46:26,332 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.359e+01 2.648e+01 3.006e+01 2.772e+02, threshold=5.296e+01, percent-clipped=2.0 2024-08-20 08:46:45,267 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 14 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-20 08:46:45,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4737500.0, ans=0.09899494936611666 2024-08-20 08:47:01,169 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 08:47:08,042 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 21 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 08:47:14,635 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 17 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-20 08:47:32,705 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 14400, loss[loss=0.1089, beats_loss=0.009344, ecapa_loss=0.0001476, whisper_loss=0.09813, over 12950.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01032, ecapa_loss=0.0001395, whisper_loss=0.09108, over 3830819.81 frames. ], batch size: 53, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:47:51,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4737900.0, ans=0.04949747468305833 2024-08-20 08:48:02,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4737900.0, ans=0.0 2024-08-20 08:48:18,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4738000.0, ans=0.1 2024-08-20 08:48:29,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4738100.0, ans=0.1 2024-08-20 08:48:45,096 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.26 vs. limit=15.0 2024-08-20 08:49:03,615 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 14450, loss[loss=0.08769, beats_loss=0.01147, ecapa_loss=0.000113, whisper_loss=0.07509, over 16109.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01028, ecapa_loss=0.0001402, whisper_loss=0.09134, over 3818074.16 frames. ], batch size: 62, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:49:11,396 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.55 vs. limit=15.0 2024-08-20 08:49:14,457 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-20 08:49:14,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4738300.0, ans=0.1 2024-08-20 08:49:19,866 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 33 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-20 08:49:24,727 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.293e+01 2.479e+01 2.732e+01 7.579e+01, threshold=4.957e+01, percent-clipped=1.0 2024-08-20 08:49:27,592 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.20 vs. limit=15.0 2024-08-20 08:49:29,060 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 20 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 08:49:44,635 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 34 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-20 08:50:24,710 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 31 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-20 08:50:34,322 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.95 vs. limit=15.0 2024-08-20 08:50:37,006 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 14500, loss[loss=0.1095, beats_loss=0.009732, ecapa_loss=0.0001506, whisper_loss=0.09827, over 17032.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01025, ecapa_loss=0.0001409, whisper_loss=0.09158, over 3852544.61 frames. ], batch size: 68, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:50:55,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4738900.0, ans=0.0 2024-08-20 08:51:08,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4738900.0, ans=0.125 2024-08-20 08:51:08,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4738900.0, ans=0.1 2024-08-20 08:51:36,292 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.78 vs. limit=15.0 2024-08-20 08:51:42,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4739100.0, ans=0.0 2024-08-20 08:51:44,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4739100.0, ans=0.125 2024-08-20 08:51:46,046 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 08:51:55,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4739200.0, ans=10.0 2024-08-20 08:51:59,113 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 22 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-20 08:52:01,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4739200.0, ans=0.1 2024-08-20 08:52:11,786 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 14550, loss[loss=0.1139, beats_loss=0.008534, ecapa_loss=0.0001383, whisper_loss=0.104, over 15180.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01021, ecapa_loss=0.000141, whisper_loss=0.09162, over 3847493.34 frames. ], batch size: 60, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:52:18,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4739300.0, ans=0.125 2024-08-20 08:52:34,248 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.256e+01 2.477e+01 2.723e+01 4.705e+01, threshold=4.954e+01, percent-clipped=0.0 2024-08-20 08:52:54,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4739500.0, ans=0.125 2024-08-20 08:52:54,935 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 20 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-20 08:53:44,327 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 14600, loss[loss=0.1036, beats_loss=0.01049, ecapa_loss=0.0001446, whisper_loss=0.09162, over 20658.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01029, ecapa_loss=0.0001405, whisper_loss=0.09044, over 3857886.58 frames. ], batch size: 84, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:53:52,479 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 29 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 08:54:11,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4739900.0, ans=0.1 2024-08-20 08:54:24,143 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 17 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-20 08:54:25,909 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 23 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-20 08:54:27,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4740000.0, ans=0.1 2024-08-20 08:54:48,727 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.96 vs. limit=15.0 2024-08-20 08:54:51,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4740100.0, ans=0.0 2024-08-20 08:54:51,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4740100.0, ans=0.1 2024-08-20 08:55:01,390 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.80 vs. limit=15.0 2024-08-20 08:55:16,415 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 14650, loss[loss=0.08947, beats_loss=0.01091, ecapa_loss=0.0001119, whisper_loss=0.07744, over 17401.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01028, ecapa_loss=0.0001399, whisper_loss=0.09052, over 3839703.14 frames. ], batch size: 66, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:55:38,304 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.329e+01 2.529e+01 2.848e+01 4.887e+01, threshold=5.058e+01, percent-clipped=0.0 2024-08-20 08:55:56,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4740500.0, ans=0.125 2024-08-20 08:56:34,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4740700.0, ans=0.125 2024-08-20 08:56:37,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4740700.0, ans=0.125 2024-08-20 08:56:39,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4740700.0, ans=0.0 2024-08-20 08:56:43,430 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.548e-03 2024-08-20 08:56:45,541 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 14700, loss[loss=0.06699, beats_loss=0.01402, ecapa_loss=0.0001162, whisper_loss=0.05181, over 14263.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01025, ecapa_loss=0.0001401, whisper_loss=0.09095, over 3849564.21 frames. ], batch size: 57, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:56:45,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4740800.0, ans=0.0 2024-08-20 08:56:54,215 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.84 vs. limit=15.0 2024-08-20 08:57:00,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4740800.0, ans=0.0 2024-08-20 08:57:10,621 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 16 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-20 08:57:33,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4741000.0, ans=0.0 2024-08-20 08:58:03,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4741200.0, ans=0.07 2024-08-20 08:58:15,522 INFO [train_multi_KD3.py:1117] (0/4) Epoch 32, batch 14750, loss[loss=0.08876, beats_loss=0.011, ecapa_loss=0.0001584, whisper_loss=0.07618, over 21177.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01033, ecapa_loss=0.0001407, whisper_loss=0.09044, over 3847829.05 frames. ], batch size: 90, lr: 1.90e-03, grad_scale: 5.764607523034235e+17 2024-08-20 08:58:36,594 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.385e+01 2.604e+01 3.059e+01 5.323e+01, threshold=5.208e+01, percent-clipped=1.0 2024-08-20 08:59:26,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4741700.0, ans=0.125 2024-08-20 08:59:26,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4741700.0, ans=0.0 2024-08-20 08:59:29,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4741700.0, ans=0.04949747468305833 2024-08-20 08:59:39,341 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-32.pt 2024-08-20 09:00:13,078 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 0, loss[loss=0.11, beats_loss=0.008682, ecapa_loss=0.0001551, whisper_loss=0.09981, over 22506.00 frames. ], tot_loss[loss=0.11, beats_loss=0.008682, ecapa_loss=0.0001551, whisper_loss=0.09981, over 22506.00 frames. ], batch size: 91, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:00:13,079 INFO [train_multi_KD3.py:1140] (0/4) Computing validation loss 2024-08-20 09:00:48,186 INFO [train_multi_KD3.py:1150] (0/4) Epoch 33, validation on ASR_libri: loss=0.2542, beats_loss=0, ecapa_loss=0.0005003, whisper_loss=0.2492, over 931116.00 frames. 2024-08-20 09:01:09,285 INFO [train_multi_KD3.py:1150] (0/4) Epoch 33, validation on SV_voxceleb1: loss=0.003963, beats_loss=0, ecapa_loss=0.0003963, whisper_loss=0, over 944235.00 frames. 2024-08-20 09:02:51,244 INFO [train_multi_KD3.py:1150] (0/4) Epoch 33, validation on AT_audioset: loss=0.02307, beats_loss=0.02307, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 09:02:51,247 INFO [train_multi_KD3.py:1156] (0/4) Maximum memory allocated so far is 32775MB 2024-08-20 09:02:53,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4741780.0, ans=0.125 2024-08-20 09:02:55,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4741780.0, ans=0.0 2024-08-20 09:02:59,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4741780.0, ans=0.125 2024-08-20 09:03:33,591 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 24 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 09:03:41,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4741980.0, ans=0.0 2024-08-20 09:03:56,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4741980.0, ans=10.0 2024-08-20 09:03:58,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4741980.0, ans=0.1 2024-08-20 09:04:14,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4742080.0, ans=0.125 2024-08-20 09:04:19,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4742080.0, ans=0.0 2024-08-20 09:04:32,081 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 32 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-20 09:04:35,067 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 20 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-20 09:04:38,170 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.59 vs. limit=15.0 2024-08-20 09:04:57,994 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 50, loss[loss=0.1041, beats_loss=0.00712, ecapa_loss=0.0001864, whisper_loss=0.09513, over 16005.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.009133, ecapa_loss=0.000145, whisper_loss=0.09122, over 917519.08 frames. ], batch size: 63, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:05:01,296 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 19 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-20 09:05:11,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4742280.0, ans=0.125 2024-08-20 09:05:26,158 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-08-20 09:05:31,505 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+01 2.495e+01 2.772e+01 3.142e+01 4.372e+01, threshold=5.543e+01, percent-clipped=0.0 2024-08-20 09:05:32,827 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 22 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-20 09:05:53,584 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 22 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-20 09:06:39,021 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 27 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-20 09:06:39,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4742680.0, ans=0.07 2024-08-20 09:06:53,975 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 100, loss[loss=0.1048, beats_loss=0.005013, ecapa_loss=0.0002093, whisper_loss=0.09772, over 13471.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.009171, ecapa_loss=0.000141, whisper_loss=0.09123, over 1548762.55 frames. ], batch size: 53, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:07:06,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=4742780.0, ans=10.0 2024-08-20 09:07:30,060 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 18 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-20 09:07:36,525 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 24 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-20 09:07:40,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4742980.0, ans=0.125 2024-08-20 09:07:58,073 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.12 vs. limit=15.0 2024-08-20 09:08:08,437 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-20 09:08:24,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4743180.0, ans=0.125 2024-08-20 09:08:42,785 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 21 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-20 09:08:43,848 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 150, loss[loss=0.1092, beats_loss=0.009654, ecapa_loss=0.0001455, whisper_loss=0.09807, over 15231.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.009243, ecapa_loss=0.0001421, whisper_loss=0.09013, over 2040710.44 frames. ], batch size: 60, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:08:49,636 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.09 vs. limit=15.0 2024-08-20 09:09:02,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4743380.0, ans=0.125 2024-08-20 09:09:04,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4743380.0, ans=0.0 2024-08-20 09:09:11,139 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.463e+01 2.692e+01 3.124e+01 4.669e+01, threshold=5.384e+01, percent-clipped=0.0 2024-08-20 09:09:17,387 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 25 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-20 09:09:25,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4743480.0, ans=0.0 2024-08-20 09:09:31,281 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 09:09:31,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4743480.0, ans=0.125 2024-08-20 09:09:32,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4743480.0, ans=0.125 2024-08-20 09:09:42,455 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 25 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-20 09:09:58,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4743680.0, ans=0.0 2024-08-20 09:10:02,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4743680.0, ans=0.2 2024-08-20 09:10:11,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4743680.0, ans=0.125 2024-08-20 09:10:17,686 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 200, loss[loss=0.1117, beats_loss=0.01038, ecapa_loss=0.0001299, whisper_loss=0.09999, over 21093.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.009476, ecapa_loss=0.0001402, whisper_loss=0.0912, over 2433367.37 frames. ], batch size: 79, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:10:19,853 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 16 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-20 09:10:23,730 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.21 vs. limit=22.5 2024-08-20 09:10:29,561 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.21 vs. limit=15.0 2024-08-20 09:10:41,381 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 26 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-20 09:10:50,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4743880.0, ans=0.125 2024-08-20 09:11:06,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4743980.0, ans=0.125 2024-08-20 09:11:10,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4744080.0, ans=0.2 2024-08-20 09:11:35,454 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 34 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-20 09:11:45,345 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 250, loss[loss=0.07614, beats_loss=0.01072, ecapa_loss=0.000124, whisper_loss=0.06419, over 14262.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.009637, ecapa_loss=0.0001397, whisper_loss=0.09156, over 2742980.90 frames. ], batch size: 54, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:11:51,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4744280.0, ans=0.0 2024-08-20 09:11:56,157 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 26 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-20 09:12:01,210 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 22 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-20 09:12:09,778 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.364e+01 2.600e+01 2.936e+01 1.943e+02, threshold=5.200e+01, percent-clipped=2.0 2024-08-20 09:12:20,435 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-20 09:12:24,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4744480.0, ans=0.0 2024-08-20 09:12:29,510 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 30 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-20 09:12:53,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4744580.0, ans=0.125 2024-08-20 09:12:55,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4744680.0, ans=0.07 2024-08-20 09:13:13,944 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 300, loss[loss=0.09844, beats_loss=0.007612, ecapa_loss=0.0001756, whisper_loss=0.08907, over 17019.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.009811, ecapa_loss=0.0001413, whisper_loss=0.09103, over 2983399.82 frames. ], batch size: 71, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:13:15,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4744780.0, ans=0.1 2024-08-20 09:13:29,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4744780.0, ans=0.05 2024-08-20 09:14:30,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=4745180.0, ans=0.05 2024-08-20 09:14:43,501 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 350, loss[loss=0.08802, beats_loss=0.01147, ecapa_loss=0.0001549, whisper_loss=0.075, over 21791.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.009971, ecapa_loss=0.0001412, whisper_loss=0.09045, over 3154457.81 frames. ], batch size: 89, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:14:51,238 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 30 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-20 09:14:54,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4745280.0, ans=0.025 2024-08-20 09:15:07,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4745380.0, ans=0.1 2024-08-20 09:15:08,047 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.237e+01 2.517e+01 2.824e+01 3.334e+02, threshold=5.035e+01, percent-clipped=1.0 2024-08-20 09:15:45,838 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2024-08-20 09:15:55,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4745680.0, ans=0.1 2024-08-20 09:16:07,599 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.06 vs. limit=12.0 2024-08-20 09:16:15,546 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 400, loss[loss=0.08998, beats_loss=0.01248, ecapa_loss=0.0001132, whisper_loss=0.07637, over 16468.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.009998, ecapa_loss=0.000141, whisper_loss=0.09026, over 3305915.69 frames. ], batch size: 66, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:16:18,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4745780.0, ans=0.0 2024-08-20 09:16:25,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4745780.0, ans=0.2 2024-08-20 09:16:43,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4745880.0, ans=0.0 2024-08-20 09:17:08,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4745980.0, ans=0.1 2024-08-20 09:17:25,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=4746080.0, ans=10.0 2024-08-20 09:17:25,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4746080.0, ans=0.0 2024-08-20 09:17:28,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4746180.0, ans=0.125 2024-08-20 09:17:47,693 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 450, loss[loss=0.1174, beats_loss=0.004515, ecapa_loss=0.0001408, whisper_loss=0.1114, over 15289.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01003, ecapa_loss=0.0001408, whisper_loss=0.09009, over 3406872.82 frames. ], batch size: 54, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:18:12,016 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.619e+01 2.270e+01 2.468e+01 2.712e+01 4.275e+01, threshold=4.935e+01, percent-clipped=0.0 2024-08-20 09:18:15,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4746380.0, ans=0.025 2024-08-20 09:18:19,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4746380.0, ans=0.0 2024-08-20 09:18:20,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4746480.0, ans=0.1 2024-08-20 09:18:40,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4746580.0, ans=0.2 2024-08-20 09:18:54,291 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.24 vs. limit=15.0 2024-08-20 09:19:11,521 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 11 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 09:19:13,691 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 09:19:18,906 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 500, loss[loss=0.1111, beats_loss=0.009282, ecapa_loss=0.0001266, whisper_loss=0.1006, over 18071.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01003, ecapa_loss=0.0001401, whisper_loss=0.09093, over 3505443.47 frames. ], batch size: 68, lr: 1.87e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:19:56,593 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 20 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 09:20:05,540 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 19 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-20 09:20:09,909 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 17 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-20 09:20:14,092 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.60 vs. limit=15.0 2024-08-20 09:20:27,042 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 13 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-20 09:20:36,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4747180.0, ans=0.125 2024-08-20 09:20:39,531 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 36 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-20 09:20:48,493 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 27 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-20 09:20:50,231 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 550, loss[loss=0.1086, beats_loss=0.01119, ecapa_loss=0.0001174, whisper_loss=0.09621, over 20111.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01005, ecapa_loss=0.0001393, whisper_loss=0.09075, over 3554186.00 frames. ], batch size: 78, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:20:50,469 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 21 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 09:21:00,473 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 25 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-20 09:21:14,631 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.750e+01 2.259e+01 2.517e+01 2.843e+01 4.116e+01, threshold=5.034e+01, percent-clipped=0.0 2024-08-20 09:21:17,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4747380.0, ans=0.125 2024-08-20 09:21:20,153 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 27 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-20 09:21:27,470 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 29 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-20 09:22:01,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4747680.0, ans=0.125 2024-08-20 09:22:11,434 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-20 09:22:19,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4747680.0, ans=0.125 2024-08-20 09:22:22,667 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 600, loss[loss=0.1329, beats_loss=0.006638, ecapa_loss=0.0001598, whisper_loss=0.1247, over 19777.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01007, ecapa_loss=0.0001387, whisper_loss=0.09076, over 3606606.32 frames. ], batch size: 74, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:22:28,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4747780.0, ans=0.0 2024-08-20 09:22:34,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4747780.0, ans=0.125 2024-08-20 09:22:39,689 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 09:22:40,447 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.35 vs. limit=22.5 2024-08-20 09:23:39,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4748180.0, ans=0.2 2024-08-20 09:23:40,918 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 27 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-20 09:23:51,919 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 26 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-20 09:23:52,931 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 650, loss[loss=0.103, beats_loss=0.009578, ecapa_loss=0.0001662, whisper_loss=0.09179, over 21452.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01012, ecapa_loss=0.0001386, whisper_loss=0.09055, over 3668461.97 frames. ], batch size: 88, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:24:17,125 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.327e+01 2.614e+01 2.843e+01 3.937e+01, threshold=5.228e+01, percent-clipped=0.0 2024-08-20 09:24:23,925 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 29 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-20 09:24:29,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4748480.0, ans=0.0 2024-08-20 09:24:41,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4748480.0, ans=0.125 2024-08-20 09:24:46,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4748580.0, ans=0.125 2024-08-20 09:25:16,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4748680.0, ans=0.035 2024-08-20 09:25:19,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4748780.0, ans=0.125 2024-08-20 09:25:21,294 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 700, loss[loss=0.08644, beats_loss=0.01102, ecapa_loss=0.000126, whisper_loss=0.07417, over 22193.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0101, ecapa_loss=0.0001395, whisper_loss=0.09032, over 3689381.49 frames. ], batch size: 87, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:25:55,188 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-20 09:26:20,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4749080.0, ans=0.125 2024-08-20 09:26:20,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4749080.0, ans=0.125 2024-08-20 09:26:31,870 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.93 vs. limit=10.0 2024-08-20 09:26:32,182 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.68 vs. limit=15.0 2024-08-20 09:26:38,213 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 25 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-20 09:26:48,804 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 750, loss[loss=0.104, beats_loss=0.009599, ecapa_loss=0.0001241, whisper_loss=0.09319, over 22083.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01008, ecapa_loss=0.0001404, whisper_loss=0.09021, over 3719930.32 frames. ], batch size: 85, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:27:05,524 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.95 vs. limit=15.0 2024-08-20 09:27:13,062 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.850e+01 2.302e+01 2.530e+01 2.816e+01 3.828e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-20 09:27:34,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4749480.0, ans=0.125 2024-08-20 09:28:03,342 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 24 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-20 09:28:18,119 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 800, loss[loss=0.08869, beats_loss=0.01228, ecapa_loss=0.000135, whisper_loss=0.07505, over 14439.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01011, ecapa_loss=0.0001399, whisper_loss=0.08959, over 3701895.35 frames. ], batch size: 58, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:28:35,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4749880.0, ans=0.0 2024-08-20 09:28:38,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4749880.0, ans=0.0 2024-08-20 09:28:40,085 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 22 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-20 09:28:48,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4749880.0, ans=0.125 2024-08-20 09:28:57,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=4749980.0, ans=15.0 2024-08-20 09:29:16,404 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 16 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-20 09:29:46,186 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 850, loss[loss=0.1063, beats_loss=0.01091, ecapa_loss=0.0001476, whisper_loss=0.0939, over 17650.00 frames. ], tot_loss[loss=0.09978, beats_loss=0.01024, ecapa_loss=0.0001385, whisper_loss=0.08816, over 3715376.59 frames. ], batch size: 72, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:29:56,064 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 17 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-20 09:29:57,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4750280.0, ans=0.125 2024-08-20 09:29:59,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4750280.0, ans=0.09899494936611666 2024-08-20 09:29:59,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4750280.0, ans=0.125 2024-08-20 09:30:09,346 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 23 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 09:30:11,286 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.195e+01 2.440e+01 2.729e+01 3.750e+01, threshold=4.881e+01, percent-clipped=0.0 2024-08-20 09:30:11,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4750380.0, ans=0.0 2024-08-20 09:30:16,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4750380.0, ans=0.125 2024-08-20 09:30:16,873 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 09:30:25,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4750480.0, ans=0.0 2024-08-20 09:30:37,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4750580.0, ans=0.1 2024-08-20 09:30:56,948 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 16 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 09:30:57,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4750680.0, ans=0.125 2024-08-20 09:31:04,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4750680.0, ans=0.125 2024-08-20 09:31:06,450 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 34 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-20 09:31:12,978 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=15.0 2024-08-20 09:31:15,782 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 900, loss[loss=0.09049, beats_loss=0.009017, ecapa_loss=0.0001215, whisper_loss=0.08025, over 20615.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.0102, ecapa_loss=0.0001377, whisper_loss=0.0888, over 3758727.43 frames. ], batch size: 79, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:31:16,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4750780.0, ans=0.0 2024-08-20 09:31:19,860 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 31 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-20 09:31:22,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4750780.0, ans=0.125 2024-08-20 09:31:25,098 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-20 09:31:25,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4750780.0, ans=0.125 2024-08-20 09:31:26,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4750780.0, ans=0.09899494936611666 2024-08-20 09:31:49,421 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 18 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-20 09:32:24,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4751180.0, ans=0.1 2024-08-20 09:32:41,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4751180.0, ans=0.125 2024-08-20 09:32:43,811 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 950, loss[loss=0.1038, beats_loss=0.009096, ecapa_loss=0.0001373, whisper_loss=0.09336, over 14018.00 frames. ], tot_loss[loss=0.09967, beats_loss=0.01026, ecapa_loss=0.000137, whisper_loss=0.08805, over 3752163.72 frames. ], batch size: 54, lr: 1.86e-03, grad_scale: 1.152921504606847e+18 2024-08-20 09:32:55,423 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 23 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-20 09:32:55,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4751280.0, ans=0.2 2024-08-20 09:33:06,500 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.77 vs. limit=15.0 2024-08-20 09:33:08,811 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.583e+01 2.210e+01 2.427e+01 2.730e+01 1.118e+02, threshold=4.854e+01, percent-clipped=2.0 2024-08-20 09:33:26,517 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 32 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 09:33:39,716 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-20 09:33:43,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4751580.0, ans=0.125 2024-08-20 09:33:48,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4751580.0, ans=0.0 2024-08-20 09:33:48,621 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.128e+01 2024-08-20 09:33:52,036 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 14 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-20 09:33:58,897 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 30 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 09:34:01,514 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.95 vs. limit=12.0 2024-08-20 09:34:06,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4751680.0, ans=0.125 2024-08-20 09:34:12,748 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 1000, loss[loss=0.09766, beats_loss=0.008886, ecapa_loss=0.0001314, whisper_loss=0.08746, over 16526.00 frames. ], tot_loss[loss=0.09997, beats_loss=0.01028, ecapa_loss=0.000136, whisper_loss=0.08833, over 3762470.73 frames. ], batch size: 63, lr: 1.86e-03, grad_scale: 1.152921504606847e+18 2024-08-20 09:34:16,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4751780.0, ans=0.0 2024-08-20 09:35:13,466 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-20 09:35:42,295 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 1050, loss[loss=0.09564, beats_loss=0.009658, ecapa_loss=0.0001592, whisper_loss=0.08439, over 16294.00 frames. ], tot_loss[loss=0.09983, beats_loss=0.01036, ecapa_loss=0.0001352, whisper_loss=0.08812, over 3750558.62 frames. ], batch size: 67, lr: 1.86e-03, grad_scale: 1.152921504606847e+18 2024-08-20 09:35:56,025 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 27 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-20 09:36:08,979 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.242e+01 2.597e+01 2.833e+01 4.409e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-20 09:36:23,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4752480.0, ans=0.125 2024-08-20 09:36:43,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4752580.0, ans=0.125 2024-08-20 09:36:58,355 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 26 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 09:37:09,067 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.39 vs. limit=22.5 2024-08-20 09:37:12,225 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 1100, loss[loss=0.1028, beats_loss=0.009503, ecapa_loss=0.0001243, whisper_loss=0.09201, over 14801.00 frames. ], tot_loss[loss=0.09995, beats_loss=0.01033, ecapa_loss=0.0001356, whisper_loss=0.08827, over 3746264.00 frames. ], batch size: 54, lr: 1.86e-03, grad_scale: 1.152921504606847e+18 2024-08-20 09:37:37,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4752880.0, ans=0.2 2024-08-20 09:37:40,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4752880.0, ans=0.0 2024-08-20 09:37:55,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4752980.0, ans=0.2 2024-08-20 09:38:04,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4752980.0, ans=0.0 2024-08-20 09:38:07,722 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-20 09:38:15,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4753080.0, ans=0.2 2024-08-20 09:38:22,273 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 09:38:32,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4753180.0, ans=0.125 2024-08-20 09:38:37,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4753180.0, ans=0.125 2024-08-20 09:38:37,823 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.51 vs. limit=22.5 2024-08-20 09:38:42,142 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 1150, loss[loss=0.1164, beats_loss=0.00806, ecapa_loss=0.0001226, whisper_loss=0.1071, over 18386.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01017, ecapa_loss=0.0001369, whisper_loss=0.08967, over 3756818.42 frames. ], batch size: 69, lr: 1.86e-03, grad_scale: 1.152921504606847e+18 2024-08-20 09:38:51,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4753280.0, ans=0.0 2024-08-20 09:38:55,552 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 25 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 09:39:06,832 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.346e+01 2.635e+01 2.990e+01 2.498e+02, threshold=5.271e+01, percent-clipped=4.0 2024-08-20 09:39:27,405 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.84 vs. limit=15.0 2024-08-20 09:40:11,331 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 1200, loss[loss=0.1074, beats_loss=0.01048, ecapa_loss=0.0001357, whisper_loss=0.09553, over 22998.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01017, ecapa_loss=0.0001374, whisper_loss=0.0895, over 3738704.91 frames. ], batch size: 90, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:40:13,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4753780.0, ans=0.0 2024-08-20 09:40:31,024 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 14 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-20 09:40:33,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4753880.0, ans=0.125 2024-08-20 09:40:34,569 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-20 09:40:51,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4753980.0, ans=0.125 2024-08-20 09:41:36,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4754180.0, ans=0.125 2024-08-20 09:41:37,387 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 24 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-20 09:41:39,236 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 1250, loss[loss=0.1, beats_loss=0.01257, ecapa_loss=0.0001263, whisper_loss=0.08618, over 17343.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01022, ecapa_loss=0.0001377, whisper_loss=0.08925, over 3737364.45 frames. ], batch size: 71, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:41:44,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4754280.0, ans=0.125 2024-08-20 09:42:03,099 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=22.5 2024-08-20 09:42:05,570 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.283e+01 2.750e+01 2.979e+01 6.876e+01, threshold=5.500e+01, percent-clipped=2.0 2024-08-20 09:42:12,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4754480.0, ans=0.125 2024-08-20 09:42:53,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4754680.0, ans=0.125 2024-08-20 09:42:57,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4754680.0, ans=0.015 2024-08-20 09:42:59,748 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.41 vs. limit=22.5 2024-08-20 09:43:05,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4754780.0, ans=0.125 2024-08-20 09:43:07,363 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 1300, loss[loss=0.1039, beats_loss=0.008616, ecapa_loss=0.0001485, whisper_loss=0.09376, over 16900.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01028, ecapa_loss=0.0001379, whisper_loss=0.08879, over 3755420.20 frames. ], batch size: 66, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:43:38,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4754880.0, ans=0.0 2024-08-20 09:43:47,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4754980.0, ans=0.125 2024-08-20 09:44:10,697 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 35 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-20 09:44:32,095 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.24 vs. limit=15.0 2024-08-20 09:44:37,940 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 1350, loss[loss=0.1138, beats_loss=0.01079, ecapa_loss=0.0001393, whisper_loss=0.1016, over 15979.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01019, ecapa_loss=0.0001383, whisper_loss=0.08946, over 3755386.53 frames. ], batch size: 63, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:45:03,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4755380.0, ans=0.1 2024-08-20 09:45:04,640 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.108e+01 2.418e+01 2.623e+01 3.290e+01, threshold=4.836e+01, percent-clipped=0.0 2024-08-20 09:45:08,626 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 19 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-20 09:45:22,836 WARNING [optim.py:496] (0/4) Scaling gradients by 0.032859351485967636, model_norm_threshold=48.36314392089844 2024-08-20 09:45:22,994 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.250e+05, grad_sumsq=3.697e+04, orig_rms_sq=8.792e+00 2024-08-20 09:45:32,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4755580.0, ans=0.025 2024-08-20 09:45:34,135 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 11 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-20 09:45:34,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4755580.0, ans=0.0 2024-08-20 09:45:48,267 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 24 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-20 09:46:03,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4755680.0, ans=0.2 2024-08-20 09:46:03,974 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.86 vs. limit=15.0 2024-08-20 09:46:08,151 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 1400, loss[loss=0.09314, beats_loss=0.01114, ecapa_loss=0.0001346, whisper_loss=0.08065, over 22630.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01036, ecapa_loss=0.0001368, whisper_loss=0.08927, over 3739166.54 frames. ], batch size: 91, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:46:10,164 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 16 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 09:46:12,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4755780.0, ans=0.125 2024-08-20 09:46:24,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4755880.0, ans=0.125 2024-08-20 09:46:30,586 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 10 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-20 09:46:39,969 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 20 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-20 09:46:49,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4755980.0, ans=0.1 2024-08-20 09:47:07,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4756080.0, ans=0.125 2024-08-20 09:47:09,460 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 26 from LS+wenet, 11 from Vox, 36 fro AS 2024-08-20 09:47:35,271 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 1450, loss[loss=0.1123, beats_loss=0.01032, ecapa_loss=0.0001328, whisper_loss=0.1006, over 22197.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01031, ecapa_loss=0.0001378, whisper_loss=0.08899, over 3729309.06 frames. ], batch size: 87, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:47:38,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4756280.0, ans=0.09899494936611666 2024-08-20 09:48:01,103 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.208e+01 2.529e+01 2.783e+01 1.472e+03, threshold=5.058e+01, percent-clipped=1.0 2024-08-20 09:48:09,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4756480.0, ans=0.125 2024-08-20 09:49:16,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4756680.0, ans=0.5 2024-08-20 09:49:27,617 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 17 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-20 09:49:30,531 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 1500, loss[loss=0.1031, beats_loss=0.01179, ecapa_loss=0.000113, whisper_loss=0.09014, over 22239.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01038, ecapa_loss=0.0001368, whisper_loss=0.08835, over 3730580.43 frames. ], batch size: 89, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:49:51,577 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.11 vs. limit=15.0 2024-08-20 09:50:01,796 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 14 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-20 09:50:03,665 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 09:50:09,138 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 32 from LS+wenet, 29 from Vox, 24 fro AS 2024-08-20 09:50:21,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4756980.0, ans=0.125 2024-08-20 09:50:24,511 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 15 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-20 09:50:30,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4757080.0, ans=0.0 2024-08-20 09:50:34,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4757080.0, ans=0.125 2024-08-20 09:50:39,350 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 24 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-20 09:50:49,009 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2024-08-20 09:50:51,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4757180.0, ans=0.04949747468305833 2024-08-20 09:51:01,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4757280.0, ans=0.125 2024-08-20 09:51:03,016 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 1550, loss[loss=0.1025, beats_loss=0.01146, ecapa_loss=0.0001403, whisper_loss=0.08959, over 21879.00 frames. ], tot_loss[loss=0.0999, beats_loss=0.01034, ecapa_loss=0.0001361, whisper_loss=0.08821, over 3734201.92 frames. ], batch size: 89, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:51:30,427 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.660e+01 2.231e+01 2.477e+01 2.793e+01 4.044e+01, threshold=4.954e+01, percent-clipped=0.0 2024-08-20 09:51:36,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4757380.0, ans=0.1 2024-08-20 09:51:46,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4757480.0, ans=0.0 2024-08-20 09:51:58,926 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 25 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-20 09:51:59,259 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=6.281e-02 2024-08-20 09:52:20,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=4757680.0, ans=8.0 2024-08-20 09:52:35,563 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 1600, loss[loss=0.07894, beats_loss=0.01201, ecapa_loss=0.0001419, whisper_loss=0.06551, over 17535.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01029, ecapa_loss=0.0001354, whisper_loss=0.08847, over 3757231.23 frames. ], batch size: 71, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:53:19,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4757980.0, ans=0.125 2024-08-20 09:53:23,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4757980.0, ans=0.125 2024-08-20 09:53:34,621 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 12 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-20 09:53:44,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4758080.0, ans=0.0 2024-08-20 09:53:51,527 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 32 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-20 09:53:53,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4758180.0, ans=0.1 2024-08-20 09:54:06,363 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 1650, loss[loss=0.09982, beats_loss=0.009404, ecapa_loss=0.0001502, whisper_loss=0.08891, over 22105.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01025, ecapa_loss=0.0001354, whisper_loss=0.089, over 3779556.79 frames. ], batch size: 90, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:54:09,388 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 09:54:24,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4758380.0, ans=0.125 2024-08-20 09:54:31,301 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 25 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 09:54:32,270 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.203e+01 2.449e+01 2.785e+01 3.857e+01, threshold=4.898e+01, percent-clipped=0.0 2024-08-20 09:54:49,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4758480.0, ans=0.125 2024-08-20 09:55:27,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4758680.0, ans=0.125 2024-08-20 09:55:31,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4758680.0, ans=0.125 2024-08-20 09:55:34,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4758780.0, ans=0.2 2024-08-20 09:55:35,276 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 1700, loss[loss=0.07191, beats_loss=0.01183, ecapa_loss=0.0001406, whisper_loss=0.05868, over 16502.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01021, ecapa_loss=0.0001368, whisper_loss=0.08916, over 3769383.73 frames. ], batch size: 70, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:55:39,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4758780.0, ans=0.0 2024-08-20 09:55:44,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4758780.0, ans=0.125 2024-08-20 09:55:56,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4758880.0, ans=0.125 2024-08-20 09:55:58,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4758880.0, ans=0.125 2024-08-20 09:56:01,289 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 16 from LS+wenet, 22 from Vox, 17 fro AS 2024-08-20 09:56:44,776 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 18 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 09:56:48,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4759180.0, ans=0.0 2024-08-20 09:57:05,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4759280.0, ans=0.125 2024-08-20 09:57:06,565 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 1750, loss[loss=0.09545, beats_loss=0.008959, ecapa_loss=0.0001545, whisper_loss=0.08494, over 17166.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01015, ecapa_loss=0.0001365, whisper_loss=0.08989, over 3787886.39 frames. ], batch size: 70, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:57:17,084 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 14 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-20 09:57:24,315 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 23 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-20 09:57:33,288 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.295e+01 2.510e+01 2.716e+01 9.441e+01, threshold=5.020e+01, percent-clipped=1.0 2024-08-20 09:57:47,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4759480.0, ans=0.0 2024-08-20 09:58:04,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4759580.0, ans=0.0 2024-08-20 09:58:11,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4759580.0, ans=0.125 2024-08-20 09:58:14,773 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 27 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-20 09:58:19,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4759680.0, ans=0.0 2024-08-20 09:58:20,323 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.77 vs. limit=15.0 2024-08-20 09:58:25,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4759680.0, ans=0.125 2024-08-20 09:58:27,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4759680.0, ans=0.2 2024-08-20 09:58:33,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4759780.0, ans=0.125 2024-08-20 09:58:34,331 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 1800, loss[loss=0.08864, beats_loss=0.01047, ecapa_loss=0.0001634, whisper_loss=0.07653, over 19273.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01023, ecapa_loss=0.0001364, whisper_loss=0.08927, over 3781199.51 frames. ], batch size: 79, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 09:58:48,723 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2024-08-20 09:59:05,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4759880.0, ans=0.1 2024-08-20 09:59:08,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4759980.0, ans=0.0 2024-08-20 09:59:09,748 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-476000.pt 2024-08-20 09:59:22,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4759980.0, ans=0.1 2024-08-20 09:59:35,793 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-20 09:59:36,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4760080.0, ans=0.2 2024-08-20 09:59:38,396 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.75 vs. limit=22.5 2024-08-20 10:00:01,336 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 1850, loss[loss=0.08306, beats_loss=0.01071, ecapa_loss=0.0001207, whisper_loss=0.07114, over 17048.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01026, ecapa_loss=0.0001359, whisper_loss=0.08906, over 3770707.28 frames. ], batch size: 66, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:00:10,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4760280.0, ans=0.05 2024-08-20 10:00:17,566 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 26 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-20 10:00:26,713 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.55 vs. limit=15.0 2024-08-20 10:00:27,072 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.285e+01 2.493e+01 2.881e+01 4.103e+01, threshold=4.985e+01, percent-clipped=0.0 2024-08-20 10:00:32,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4760380.0, ans=0.025 2024-08-20 10:00:42,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4760480.0, ans=0.125 2024-08-20 10:00:57,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4760580.0, ans=0.2 2024-08-20 10:01:04,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4760580.0, ans=0.0 2024-08-20 10:01:12,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4760680.0, ans=0.125 2024-08-20 10:01:28,636 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 1900, loss[loss=0.1011, beats_loss=0.01165, ecapa_loss=0.0001153, whisper_loss=0.08829, over 14948.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01032, ecapa_loss=0.000135, whisper_loss=0.08874, over 3773510.53 frames. ], batch size: 57, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:01:31,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4760780.0, ans=0.0 2024-08-20 10:01:36,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4760780.0, ans=0.04949747468305833 2024-08-20 10:01:49,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4760880.0, ans=0.125 2024-08-20 10:01:51,492 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 19 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-20 10:01:56,436 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-20 10:01:59,392 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 19 from LS+wenet, 27 from Vox, 21 fro AS 2024-08-20 10:02:01,262 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 23 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-20 10:02:02,864 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 18 from LS+wenet, 21 from Vox, 51 fro AS 2024-08-20 10:02:15,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4760980.0, ans=0.2 2024-08-20 10:02:18,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4761080.0, ans=0.07 2024-08-20 10:02:27,345 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 26 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-20 10:02:29,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4761080.0, ans=0.07 2024-08-20 10:02:36,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4761180.0, ans=0.125 2024-08-20 10:02:45,242 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 19 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 10:02:45,961 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2024-08-20 10:02:53,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4761280.0, ans=0.1 2024-08-20 10:02:54,995 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 1950, loss[loss=0.08788, beats_loss=0.009168, ecapa_loss=0.00015, whisper_loss=0.07721, over 13550.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01022, ecapa_loss=0.0001356, whisper_loss=0.0895, over 3773414.27 frames. ], batch size: 56, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:03:05,413 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 22 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-20 10:03:09,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4761280.0, ans=0.125 2024-08-20 10:03:13,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4761380.0, ans=0.2 2024-08-20 10:03:19,911 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.238e+01 2.475e+01 2.855e+01 5.978e+01, threshold=4.950e+01, percent-clipped=1.0 2024-08-20 10:03:23,028 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 12 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 10:03:35,796 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 10:04:20,882 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 2000, loss[loss=0.1205, beats_loss=0.007075, ecapa_loss=0.0001877, whisper_loss=0.1115, over 13444.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.0103, ecapa_loss=0.0001365, whisper_loss=0.08853, over 3776967.49 frames. ], batch size: 53, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:05:02,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4761980.0, ans=0.125 2024-08-20 10:05:04,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4761980.0, ans=0.125 2024-08-20 10:05:06,075 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.70 vs. limit=12.0 2024-08-20 10:05:07,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4761980.0, ans=0.125 2024-08-20 10:05:13,341 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.63 vs. limit=12.0 2024-08-20 10:05:35,392 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 24 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-20 10:05:35,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4762180.0, ans=0.125 2024-08-20 10:05:37,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4762180.0, ans=0.125 2024-08-20 10:05:39,268 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 25 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 10:05:42,451 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 34 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 10:05:46,734 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 2050, loss[loss=0.1163, beats_loss=0.01025, ecapa_loss=0.0001228, whisper_loss=0.1048, over 17919.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01027, ecapa_loss=0.0001363, whisper_loss=0.08888, over 3776147.86 frames. ], batch size: 67, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:05:52,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4762280.0, ans=0.1 2024-08-20 10:06:07,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4762380.0, ans=0.125 2024-08-20 10:06:13,896 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.228e+01 2.469e+01 2.687e+01 4.353e+01, threshold=4.939e+01, percent-clipped=0.0 2024-08-20 10:06:29,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4762480.0, ans=0.95 2024-08-20 10:06:54,985 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 22 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-20 10:07:12,965 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 2100, loss[loss=0.07454, beats_loss=0.01245, ecapa_loss=0.0001075, whisper_loss=0.06102, over 19267.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01027, ecapa_loss=0.0001367, whisper_loss=0.08885, over 3742662.02 frames. ], batch size: 76, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:07:16,999 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 23 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-20 10:07:20,427 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.199e+05 2024-08-20 10:07:51,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4762980.0, ans=0.2 2024-08-20 10:08:00,127 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-20 10:08:02,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4762980.0, ans=0.0 2024-08-20 10:08:05,489 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.81 vs. limit=15.0 2024-08-20 10:08:16,777 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 10:08:25,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4763180.0, ans=0.0 2024-08-20 10:08:29,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4763180.0, ans=0.125 2024-08-20 10:08:38,805 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 2150, loss[loss=0.09617, beats_loss=0.01176, ecapa_loss=0.0001356, whisper_loss=0.08305, over 21966.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01023, ecapa_loss=0.0001365, whisper_loss=0.08886, over 3745744.58 frames. ], batch size: 90, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:08:45,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.81 vs. limit=15.0 2024-08-20 10:09:05,279 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.276e+01 2.500e+01 2.856e+01 5.859e+01, threshold=5.000e+01, percent-clipped=1.0 2024-08-20 10:09:05,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4763380.0, ans=0.125 2024-08-20 10:09:06,186 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.28 vs. limit=10.0 2024-08-20 10:09:08,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.52 vs. limit=22.5 2024-08-20 10:09:14,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4763480.0, ans=0.0 2024-08-20 10:09:16,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=4763480.0, ans=22.5 2024-08-20 10:09:32,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4763580.0, ans=0.0 2024-08-20 10:09:42,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4763580.0, ans=0.1 2024-08-20 10:09:49,491 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 21 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-20 10:09:51,167 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 21 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-20 10:09:57,805 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 10:10:05,301 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 2200, loss[loss=0.1222, beats_loss=0.007684, ecapa_loss=0.0001586, whisper_loss=0.113, over 14393.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01031, ecapa_loss=0.0001354, whisper_loss=0.08885, over 3767676.21 frames. ], batch size: 57, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:10:19,035 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 16 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 10:10:26,510 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.16 vs. limit=12.0 2024-08-20 10:10:33,151 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 17 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 10:10:40,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4763980.0, ans=0.0 2024-08-20 10:10:40,784 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.65 vs. limit=15.0 2024-08-20 10:10:54,992 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 20 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 10:11:00,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4764080.0, ans=0.0 2024-08-20 10:11:03,593 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-20 10:11:19,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=4764180.0, ans=0.1 2024-08-20 10:11:25,136 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 24 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-20 10:11:30,007 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 2250, loss[loss=0.1103, beats_loss=0.01045, ecapa_loss=0.0001098, whisper_loss=0.09878, over 24504.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01031, ecapa_loss=0.0001352, whisper_loss=0.08939, over 3747844.15 frames. ], batch size: 92, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:11:31,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4764280.0, ans=0.125 2024-08-20 10:11:40,763 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 31 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 10:11:49,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4764380.0, ans=0.125 2024-08-20 10:11:55,373 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.206e+01 2.415e+01 2.754e+01 4.736e+01, threshold=4.831e+01, percent-clipped=0.0 2024-08-20 10:12:06,782 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2024-08-20 10:12:13,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4764480.0, ans=0.125 2024-08-20 10:12:37,189 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 33 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-20 10:12:40,704 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 23 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-20 10:12:44,249 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 21 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 10:12:55,544 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 2300, loss[loss=0.103, beats_loss=0.01156, ecapa_loss=0.0001421, whisper_loss=0.09006, over 18408.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0103, ecapa_loss=0.0001363, whisper_loss=0.08997, over 3736480.55 frames. ], batch size: 75, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:12:59,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4764780.0, ans=0.1 2024-08-20 10:13:01,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4764780.0, ans=0.0 2024-08-20 10:13:26,445 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 21 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-20 10:13:26,687 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 10:13:51,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4765080.0, ans=0.125 2024-08-20 10:14:08,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4765180.0, ans=0.125 2024-08-20 10:14:12,390 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 29 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 10:14:20,894 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 25 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-20 10:14:21,862 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 2350, loss[loss=0.1117, beats_loss=0.009802, ecapa_loss=0.000117, whisper_loss=0.1008, over 17445.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0103, ecapa_loss=0.0001371, whisper_loss=0.0906, over 3744093.88 frames. ], batch size: 67, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:14:31,463 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-20 10:14:35,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4765280.0, ans=0.0 2024-08-20 10:14:48,457 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.264e+01 2.559e+01 2.893e+01 5.027e+01, threshold=5.117e+01, percent-clipped=1.0 2024-08-20 10:14:52,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4765380.0, ans=0.125 2024-08-20 10:14:53,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4765380.0, ans=0.0 2024-08-20 10:15:29,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4765680.0, ans=0.2 2024-08-20 10:15:32,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4765680.0, ans=0.125 2024-08-20 10:15:46,591 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 2400, loss[loss=0.09693, beats_loss=0.01083, ecapa_loss=0.0001803, whisper_loss=0.0843, over 21067.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01037, ecapa_loss=0.0001372, whisper_loss=0.09, over 3756766.55 frames. ], batch size: 88, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:15:48,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4765780.0, ans=0.125 2024-08-20 10:15:57,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4765780.0, ans=0.0 2024-08-20 10:16:05,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4765880.0, ans=0.125 2024-08-20 10:16:43,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4766080.0, ans=0.0 2024-08-20 10:17:11,570 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 2450, loss[loss=0.09368, beats_loss=0.009583, ecapa_loss=0.0001348, whisper_loss=0.08275, over 23594.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0103, ecapa_loss=0.0001375, whisper_loss=0.09034, over 3770057.73 frames. ], batch size: 93, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:17:29,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4766380.0, ans=0.95 2024-08-20 10:17:33,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4766380.0, ans=0.2 2024-08-20 10:17:35,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4766380.0, ans=0.1 2024-08-20 10:17:38,559 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.311e+01 2.583e+01 2.758e+01 5.133e+01, threshold=5.165e+01, percent-clipped=1.0 2024-08-20 10:18:02,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4766480.0, ans=0.125 2024-08-20 10:18:15,677 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 34 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-20 10:18:21,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4766680.0, ans=0.035 2024-08-20 10:18:28,899 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.85 vs. limit=6.0 2024-08-20 10:18:31,926 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 23 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-20 10:18:32,209 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=7.858e-03 2024-08-20 10:18:35,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4766680.0, ans=0.125 2024-08-20 10:18:40,982 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 2500, loss[loss=0.09273, beats_loss=0.0112, ecapa_loss=0.0001208, whisper_loss=0.08032, over 22378.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01029, ecapa_loss=0.000137, whisper_loss=0.09013, over 3769600.74 frames. ], batch size: 90, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:19:00,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4766880.0, ans=0.1 2024-08-20 10:19:09,653 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.59 vs. limit=22.5 2024-08-20 10:19:23,650 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.08 vs. limit=15.0 2024-08-20 10:19:27,834 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.85 vs. limit=15.0 2024-08-20 10:19:44,863 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=26.99 vs. limit=22.5 2024-08-20 10:20:00,109 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 28 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 10:20:02,024 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 10:20:02,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4767180.0, ans=0.125 2024-08-20 10:20:12,090 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 2550, loss[loss=0.08647, beats_loss=0.01085, ecapa_loss=0.000137, whisper_loss=0.07425, over 13632.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01047, ecapa_loss=0.0001369, whisper_loss=0.08915, over 3778695.08 frames. ], batch size: 54, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:20:24,030 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.86 vs. limit=15.0 2024-08-20 10:20:34,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4767380.0, ans=0.125 2024-08-20 10:20:38,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4767380.0, ans=0.1 2024-08-20 10:20:39,332 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.312e+01 2.481e+01 2.687e+01 3.912e+01, threshold=4.961e+01, percent-clipped=0.0 2024-08-20 10:20:54,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4767480.0, ans=0.09899494936611666 2024-08-20 10:20:57,363 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.89 vs. limit=15.0 2024-08-20 10:21:11,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4767580.0, ans=0.0 2024-08-20 10:21:15,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=4767580.0, ans=0.05 2024-08-20 10:21:18,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4767580.0, ans=0.0 2024-08-20 10:21:18,577 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.90 vs. limit=10.0 2024-08-20 10:21:28,767 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.24 vs. limit=15.0 2024-08-20 10:21:42,276 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 2600, loss[loss=0.1048, beats_loss=0.01087, ecapa_loss=0.0001171, whisper_loss=0.09276, over 23722.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01039, ecapa_loss=0.0001373, whisper_loss=0.09029, over 3817154.06 frames. ], batch size: 91, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:22:12,880 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=15.0 2024-08-20 10:22:25,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4767980.0, ans=0.0 2024-08-20 10:22:40,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4768080.0, ans=0.0 2024-08-20 10:22:58,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4768180.0, ans=0.125 2024-08-20 10:22:58,272 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-20 10:23:02,189 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 19 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-20 10:23:07,454 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 30 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-20 10:23:09,346 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 38 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-20 10:23:10,966 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 2650, loss[loss=0.1288, beats_loss=0.007, ecapa_loss=0.0001514, whisper_loss=0.1202, over 22688.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01031, ecapa_loss=0.0001385, whisper_loss=0.09089, over 3858910.34 frames. ], batch size: 89, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:23:15,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4768280.0, ans=0.0 2024-08-20 10:23:18,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4768280.0, ans=0.125 2024-08-20 10:23:29,663 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.71 vs. limit=10.0 2024-08-20 10:23:38,220 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.716e+01 2.210e+01 2.428e+01 2.721e+01 4.084e+01, threshold=4.855e+01, percent-clipped=0.0 2024-08-20 10:23:41,883 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 27 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-20 10:24:00,975 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 10:24:04,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4768580.0, ans=0.0 2024-08-20 10:24:40,506 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 10:24:41,438 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 2700, loss[loss=0.1027, beats_loss=0.01082, ecapa_loss=0.000146, whisper_loss=0.09041, over 21963.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01035, ecapa_loss=0.0001381, whisper_loss=0.09087, over 3845504.31 frames. ], batch size: 90, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:24:51,167 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 10:24:55,778 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 17 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 10:24:59,244 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 16 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 10:25:17,993 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.88 vs. limit=22.5 2024-08-20 10:25:42,479 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 28 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-20 10:25:46,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4769080.0, ans=0.125 2024-08-20 10:26:12,180 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2024-08-20 10:26:12,618 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 2750, loss[loss=0.1107, beats_loss=0.008367, ecapa_loss=0.0001741, whisper_loss=0.1006, over 22089.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0104, ecapa_loss=0.0001372, whisper_loss=0.09048, over 3840228.63 frames. ], batch size: 88, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:26:24,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4769280.0, ans=0.125 2024-08-20 10:26:38,574 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.290e+01 2.547e+01 2.861e+01 3.965e+01, threshold=5.094e+01, percent-clipped=0.0 2024-08-20 10:26:43,949 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 24 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-20 10:26:47,512 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 10:27:00,033 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 18 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-20 10:27:13,735 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=15.98 vs. limit=15.0 2024-08-20 10:27:41,794 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 2800, loss[loss=0.09891, beats_loss=0.01341, ecapa_loss=9.247e-05, whisper_loss=0.08457, over 17780.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01037, ecapa_loss=0.0001377, whisper_loss=0.09012, over 3776234.50 frames. ], batch size: 68, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:27:42,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4769780.0, ans=0.125 2024-08-20 10:27:42,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4769780.0, ans=0.125 2024-08-20 10:28:11,118 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.94 vs. limit=6.0 2024-08-20 10:28:12,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4769880.0, ans=0.125 2024-08-20 10:29:05,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4770180.0, ans=0.1 2024-08-20 10:29:10,468 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 2850, loss[loss=0.1017, beats_loss=0.008224, ecapa_loss=0.0001774, whisper_loss=0.09168, over 21422.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01034, ecapa_loss=0.0001384, whisper_loss=0.09047, over 3766064.57 frames. ], batch size: 87, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:29:10,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4770280.0, ans=0.125 2024-08-20 10:29:15,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4770280.0, ans=0.0 2024-08-20 10:29:32,619 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 27 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 10:29:37,329 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.626e+01 2.251e+01 2.450e+01 2.765e+01 5.044e+01, threshold=4.900e+01, percent-clipped=0.0 2024-08-20 10:29:44,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4770480.0, ans=0.125 2024-08-20 10:29:47,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4770480.0, ans=0.125 2024-08-20 10:29:53,226 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 19 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-20 10:30:16,278 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 22 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-20 10:30:20,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4770680.0, ans=0.0 2024-08-20 10:30:21,988 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 22 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-20 10:30:33,596 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 22 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 10:30:38,577 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 2900, loss[loss=0.09991, beats_loss=0.01083, ecapa_loss=0.0001493, whisper_loss=0.08758, over 18927.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01043, ecapa_loss=0.0001379, whisper_loss=0.08999, over 3789253.31 frames. ], batch size: 76, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:30:42,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4770780.0, ans=0.0 2024-08-20 10:30:44,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4770780.0, ans=0.125 2024-08-20 10:30:45,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4770780.0, ans=0.125 2024-08-20 10:31:03,643 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.53 vs. limit=22.5 2024-08-20 10:31:04,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4770880.0, ans=0.125 2024-08-20 10:31:06,677 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 18 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 10:31:26,330 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.66 vs. limit=10.0 2024-08-20 10:31:33,913 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 21 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-20 10:32:02,606 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.159e+01 2024-08-20 10:32:04,926 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.63 vs. limit=15.0 2024-08-20 10:32:08,262 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 2950, loss[loss=0.1258, beats_loss=0.006743, ecapa_loss=0.0001529, whisper_loss=0.1176, over 14182.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01044, ecapa_loss=0.0001384, whisper_loss=0.0901, over 3767498.96 frames. ], batch size: 53, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:32:35,046 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.026e+01 2.380e+01 2.528e+01 2.893e+01 7.268e+01, threshold=5.056e+01, percent-clipped=1.0 2024-08-20 10:32:50,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4771480.0, ans=0.0 2024-08-20 10:33:05,077 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 14 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 10:33:10,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4771580.0, ans=0.2 2024-08-20 10:33:12,974 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.79 vs. limit=15.0 2024-08-20 10:33:14,959 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.81 vs. limit=15.0 2024-08-20 10:33:19,108 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 24 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-20 10:33:37,798 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 3000, loss[loss=0.1063, beats_loss=0.01022, ecapa_loss=0.0001381, whisper_loss=0.0947, over 22390.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001387, whisper_loss=0.09029, over 3818253.32 frames. ], batch size: 88, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:33:37,799 INFO [train_multi_KD3.py:1140] (0/4) Computing validation loss 2024-08-20 10:34:13,838 INFO [train_multi_KD3.py:1150] (0/4) Epoch 33, validation on ASR_libri: loss=0.2557, beats_loss=0, ecapa_loss=0.0005125, whisper_loss=0.2506, over 931116.00 frames. 2024-08-20 10:34:36,595 INFO [train_multi_KD3.py:1150] (0/4) Epoch 33, validation on SV_voxceleb1: loss=0.003928, beats_loss=0, ecapa_loss=0.0003928, whisper_loss=0, over 944235.00 frames. 2024-08-20 10:36:13,009 INFO [train_multi_KD3.py:1150] (0/4) Epoch 33, validation on AT_audioset: loss=0.023, beats_loss=0.023, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 10:36:13,013 INFO [train_multi_KD3.py:1156] (0/4) Maximum memory allocated so far is 32775MB 2024-08-20 10:36:14,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4771780.0, ans=0.125 2024-08-20 10:36:33,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4771880.0, ans=0.125 2024-08-20 10:36:36,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4771880.0, ans=0.125 2024-08-20 10:36:52,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4771980.0, ans=0.125 2024-08-20 10:36:55,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4771980.0, ans=0.125 2024-08-20 10:37:14,544 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.36 vs. limit=22.5 2024-08-20 10:37:31,691 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.79 vs. limit=15.0 2024-08-20 10:37:33,585 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 3050, loss[loss=0.1124, beats_loss=0.01072, ecapa_loss=0.0001124, whisper_loss=0.1006, over 23130.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01038, ecapa_loss=0.0001396, whisper_loss=0.09085, over 3827121.08 frames. ], batch size: 88, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:37:58,955 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.325e+01 2.539e+01 2.982e+01 4.388e+01, threshold=5.078e+01, percent-clipped=0.0 2024-08-20 10:38:02,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4772380.0, ans=0.0 2024-08-20 10:38:06,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4772480.0, ans=0.125 2024-08-20 10:38:07,334 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 14 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-20 10:38:33,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4772580.0, ans=0.2 2024-08-20 10:38:36,810 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 33 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 10:38:37,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4772580.0, ans=0.0 2024-08-20 10:38:38,607 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 22 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-20 10:38:45,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4772680.0, ans=0.0 2024-08-20 10:38:46,311 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 10:38:55,598 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 3100, loss[loss=0.0959, beats_loss=0.01315, ecapa_loss=9.247e-05, whisper_loss=0.08183, over 17511.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01041, ecapa_loss=0.0001385, whisper_loss=0.09037, over 3837696.14 frames. ], batch size: 66, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:39:02,909 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 10:39:07,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4772780.0, ans=0.125 2024-08-20 10:39:17,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4772880.0, ans=0.2 2024-08-20 10:39:25,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4772880.0, ans=0.0 2024-08-20 10:39:29,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4772980.0, ans=0.1 2024-08-20 10:39:35,309 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 32 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-20 10:39:37,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4772980.0, ans=0.125 2024-08-20 10:39:39,784 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.00 vs. limit=22.5 2024-08-20 10:39:50,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4773080.0, ans=0.125 2024-08-20 10:39:56,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4773080.0, ans=0.2 2024-08-20 10:40:17,700 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 3150, loss[loss=0.1171, beats_loss=0.01105, ecapa_loss=0.000132, whisper_loss=0.1047, over 18165.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01041, ecapa_loss=0.0001398, whisper_loss=0.09035, over 3840321.44 frames. ], batch size: 66, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:40:24,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4773280.0, ans=0.0 2024-08-20 10:40:33,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4773380.0, ans=0.125 2024-08-20 10:40:35,518 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 20 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-20 10:40:42,187 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.206e+01 2.480e+01 2.972e+01 5.332e+01, threshold=4.960e+01, percent-clipped=1.0 2024-08-20 10:40:42,411 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 10:41:14,662 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.47 vs. limit=22.5 2024-08-20 10:41:15,600 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 28 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-20 10:41:25,519 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 26 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-20 10:41:30,447 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 11 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 10:41:30,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4773680.0, ans=0.125 2024-08-20 10:41:38,129 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 3200, loss[loss=0.09778, beats_loss=0.01092, ecapa_loss=9.581e-05, whisper_loss=0.0859, over 13943.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01047, ecapa_loss=0.0001401, whisper_loss=0.08945, over 3790754.80 frames. ], batch size: 51, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:41:41,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4773780.0, ans=0.125 2024-08-20 10:41:50,093 INFO [train_multi_KD3.py:845] (0/4) A total of 49 cuts. 17 from LS+wenet, 10 from Vox, 22 fro AS 2024-08-20 10:41:50,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4773780.0, ans=0.0 2024-08-20 10:41:54,479 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 22 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 10:41:54,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4773880.0, ans=0.125 2024-08-20 10:41:54,911 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.72 vs. limit=22.5 2024-08-20 10:41:56,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4773880.0, ans=0.1 2024-08-20 10:42:30,708 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 17 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-20 10:42:37,441 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 15 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-20 10:42:43,505 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 20 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 10:42:43,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4774180.0, ans=0.125 2024-08-20 10:42:55,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4774180.0, ans=0.125 2024-08-20 10:42:59,581 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 3250, loss[loss=0.1249, beats_loss=0.01007, ecapa_loss=0.0001239, whisper_loss=0.1136, over 24350.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.0001407, whisper_loss=0.0904, over 3798955.57 frames. ], batch size: 93, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:43:04,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4774280.0, ans=0.125 2024-08-20 10:43:11,872 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.54 vs. limit=5.0 2024-08-20 10:43:18,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4774380.0, ans=0.05 2024-08-20 10:43:25,945 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.267e+01 2.440e+01 2.710e+01 3.634e+01, threshold=4.881e+01, percent-clipped=0.0 2024-08-20 10:43:35,464 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.84 vs. limit=6.0 2024-08-20 10:43:55,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4774580.0, ans=0.125 2024-08-20 10:44:07,996 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 23 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-20 10:44:12,884 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.28 vs. limit=22.5 2024-08-20 10:44:13,004 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.16 vs. limit=15.0 2024-08-20 10:44:14,636 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 21 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-20 10:44:23,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4774780.0, ans=0.2 2024-08-20 10:44:25,476 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 3300, loss[loss=0.1174, beats_loss=0.01067, ecapa_loss=0.0001429, whisper_loss=0.1053, over 22753.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01046, ecapa_loss=0.0001397, whisper_loss=0.0903, over 3812144.92 frames. ], batch size: 91, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:44:47,329 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.29 vs. limit=15.0 2024-08-20 10:44:57,410 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 23 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-20 10:45:19,674 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.77 vs. limit=22.5 2024-08-20 10:45:50,682 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 3350, loss[loss=0.09469, beats_loss=0.01534, ecapa_loss=9.588e-05, whisper_loss=0.07839, over 17390.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.0001405, whisper_loss=0.09033, over 3794970.21 frames. ], batch size: 69, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:45:50,840 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 26 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-20 10:45:55,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4775280.0, ans=0.05 2024-08-20 10:46:16,920 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.283e+01 2.507e+01 2.665e+01 5.653e+01, threshold=5.015e+01, percent-clipped=1.0 2024-08-20 10:46:23,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4775480.0, ans=0.95 2024-08-20 10:46:28,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4775480.0, ans=0.125 2024-08-20 10:46:40,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4775580.0, ans=0.125 2024-08-20 10:46:51,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4775580.0, ans=0.125 2024-08-20 10:46:58,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4775680.0, ans=0.125 2024-08-20 10:47:11,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4775780.0, ans=0.125 2024-08-20 10:47:13,171 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 3400, loss[loss=0.06927, beats_loss=0.01098, ecapa_loss=0.0001518, whisper_loss=0.05677, over 15938.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01039, ecapa_loss=0.0001405, whisper_loss=0.08951, over 3771252.99 frames. ], batch size: 68, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:47:14,129 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.21 vs. limit=15.0 2024-08-20 10:47:20,257 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 22 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-20 10:47:28,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4775880.0, ans=0.0 2024-08-20 10:47:32,562 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2024-08-20 10:47:42,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4775880.0, ans=0.125 2024-08-20 10:48:00,778 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.20 vs. limit=15.0 2024-08-20 10:48:07,088 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 25 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-20 10:48:36,714 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 3450, loss[loss=0.1131, beats_loss=0.0105, ecapa_loss=0.000117, whisper_loss=0.1015, over 19533.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.0104, ecapa_loss=0.0001401, whisper_loss=0.08934, over 3820084.31 frames. ], batch size: 73, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:49:02,942 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.189e+01 2.533e+01 2.792e+01 4.546e+01, threshold=5.066e+01, percent-clipped=0.0 2024-08-20 10:49:08,731 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 25 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-20 10:49:54,784 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 22 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-20 10:49:59,513 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 3500, loss[loss=0.1037, beats_loss=0.01018, ecapa_loss=0.0001421, whisper_loss=0.09205, over 20669.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01036, ecapa_loss=0.0001403, whisper_loss=0.08915, over 3783944.14 frames. ], batch size: 83, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:50:13,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4776780.0, ans=0.0 2024-08-20 10:50:32,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4776980.0, ans=0.125 2024-08-20 10:50:45,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4776980.0, ans=0.125 2024-08-20 10:51:22,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4777180.0, ans=0.125 2024-08-20 10:51:24,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4777280.0, ans=0.0 2024-08-20 10:51:25,490 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 3550, loss[loss=0.09245, beats_loss=0.0137, ecapa_loss=0.0001419, whisper_loss=0.07733, over 19816.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01044, ecapa_loss=0.0001402, whisper_loss=0.0888, over 3804719.14 frames. ], batch size: 84, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:51:26,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4777280.0, ans=0.125 2024-08-20 10:51:41,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4777380.0, ans=0.0 2024-08-20 10:51:46,837 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 29 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-20 10:51:52,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4777380.0, ans=0.125 2024-08-20 10:51:53,133 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.728e+01 2.256e+01 2.446e+01 2.719e+01 3.472e+01, threshold=4.892e+01, percent-clipped=0.0 2024-08-20 10:51:59,173 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 29 from LS+wenet, 12 from Vox, 42 fro AS 2024-08-20 10:52:01,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4777480.0, ans=0.0 2024-08-20 10:52:14,676 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 22 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-20 10:52:21,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4777580.0, ans=0.0 2024-08-20 10:52:23,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4777580.0, ans=0.2 2024-08-20 10:52:27,133 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.53 vs. limit=12.0 2024-08-20 10:52:30,382 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=15.0 2024-08-20 10:52:42,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4777680.0, ans=0.125 2024-08-20 10:52:52,781 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 3600, loss[loss=0.09015, beats_loss=0.01101, ecapa_loss=0.0001468, whisper_loss=0.07767, over 19943.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01048, ecapa_loss=0.0001394, whisper_loss=0.089, over 3842760.42 frames. ], batch size: 83, lr: 1.86e-03, grad_scale: 5.764607523034235e+17 2024-08-20 10:53:10,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4777880.0, ans=0.0 2024-08-20 10:53:42,392 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 18 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 10:54:44,392 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 18 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-20 10:54:51,591 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 3650, loss[loss=0.08932, beats_loss=0.01259, ecapa_loss=0.0001274, whisper_loss=0.07546, over 18840.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01047, ecapa_loss=0.0001395, whisper_loss=0.08872, over 3810576.61 frames. ], batch size: 75, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 10:55:04,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4778280.0, ans=0.125 2024-08-20 10:55:33,427 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.703e+01 2.274e+01 2.510e+01 2.811e+01 1.402e+02, threshold=5.019e+01, percent-clipped=2.0 2024-08-20 10:55:36,411 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 16 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-20 10:55:44,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4778480.0, ans=0.2 2024-08-20 10:56:07,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4778580.0, ans=0.125 2024-08-20 10:56:39,494 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=15.0 2024-08-20 10:56:53,782 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 3700, loss[loss=0.09554, beats_loss=0.01036, ecapa_loss=0.0001439, whisper_loss=0.08374, over 13664.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01047, ecapa_loss=0.0001401, whisper_loss=0.08894, over 3798921.25 frames. ], batch size: 55, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 10:57:01,769 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.88 vs. limit=6.0 2024-08-20 10:57:07,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4778780.0, ans=0.0 2024-08-20 10:57:15,699 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 17 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-20 10:58:05,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4779080.0, ans=0.0 2024-08-20 10:58:14,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4779080.0, ans=0.125 2024-08-20 10:58:41,676 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.57 vs. limit=15.0 2024-08-20 10:58:44,824 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 3750, loss[loss=0.1124, beats_loss=0.00943, ecapa_loss=0.0001215, whisper_loss=0.1017, over 19976.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0104, ecapa_loss=0.0001395, whisper_loss=0.08939, over 3776757.19 frames. ], batch size: 77, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 10:59:01,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4779280.0, ans=0.125 2024-08-20 10:59:03,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4779280.0, ans=0.95 2024-08-20 10:59:25,904 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.87 vs. limit=22.5 2024-08-20 10:59:28,242 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.202e+01 2.451e+01 2.797e+01 4.489e+01, threshold=4.903e+01, percent-clipped=0.0 2024-08-20 10:59:32,482 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 31 from LS+wenet, 14 from Vox, 43 fro AS 2024-08-20 11:00:13,125 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 21 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-20 11:00:13,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4779580.0, ans=0.125 2024-08-20 11:00:23,432 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 27 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 11:00:34,835 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 23 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-20 11:00:50,444 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 3800, loss[loss=0.1029, beats_loss=0.0113, ecapa_loss=0.0001245, whisper_loss=0.09032, over 21969.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01038, ecapa_loss=0.0001393, whisper_loss=0.089, over 3811221.43 frames. ], batch size: 86, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:01:08,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4779780.0, ans=0.2 2024-08-20 11:01:22,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4779880.0, ans=0.1 2024-08-20 11:01:24,980 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-20 11:02:01,799 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-20 11:02:11,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4780080.0, ans=0.0 2024-08-20 11:02:18,185 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.89 vs. limit=15.0 2024-08-20 11:02:29,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4780180.0, ans=0.1 2024-08-20 11:02:29,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4780180.0, ans=0.1 2024-08-20 11:02:37,658 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 11:02:45,319 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 11:02:45,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4780180.0, ans=0.09899494936611666 2024-08-20 11:02:48,796 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 14 from LS+wenet, 8 from Vox, 33 fro AS 2024-08-20 11:02:57,336 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 3850, loss[loss=0.105, beats_loss=0.009416, ecapa_loss=0.0001428, whisper_loss=0.09419, over 16028.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01044, ecapa_loss=0.0001385, whisper_loss=0.08913, over 3825785.20 frames. ], batch size: 61, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:03:25,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4780380.0, ans=0.2 2024-08-20 11:03:25,877 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.61 vs. limit=15.0 2024-08-20 11:03:32,601 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.875e+01 2.347e+01 2.624e+01 2.897e+01 4.079e+01, threshold=5.248e+01, percent-clipped=0.0 2024-08-20 11:03:48,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4780480.0, ans=0.125 2024-08-20 11:04:11,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4780580.0, ans=0.0 2024-08-20 11:04:20,835 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.066e+00 2024-08-20 11:04:20,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4780680.0, ans=0.0 2024-08-20 11:04:23,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4780680.0, ans=0.125 2024-08-20 11:04:25,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4780680.0, ans=0.125 2024-08-20 11:04:40,744 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 3900, loss[loss=0.08524, beats_loss=0.01474, ecapa_loss=0.0001125, whisper_loss=0.06937, over 17817.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01052, ecapa_loss=0.0001388, whisper_loss=0.08895, over 3805546.26 frames. ], batch size: 73, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:04:43,224 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 29 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-20 11:04:47,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4780780.0, ans=0.04949747468305833 2024-08-20 11:04:51,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4780780.0, ans=0.04949747468305833 2024-08-20 11:04:54,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4780780.0, ans=0.5 2024-08-20 11:05:06,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4780880.0, ans=0.125 2024-08-20 11:05:07,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4780880.0, ans=0.125 2024-08-20 11:05:55,351 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.35 vs. limit=15.0 2024-08-20 11:05:57,550 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.63 vs. limit=15.0 2024-08-20 11:06:05,344 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 34 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 11:06:05,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4781180.0, ans=0.125 2024-08-20 11:06:09,780 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 13 from LS+wenet, 32 from Vox, 39 fro AS 2024-08-20 11:06:14,849 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 34 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-20 11:06:31,149 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 3950, loss[loss=0.1136, beats_loss=0.008907, ecapa_loss=0.0001387, whisper_loss=0.1033, over 17767.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01038, ecapa_loss=0.000139, whisper_loss=0.09037, over 3828545.58 frames. ], batch size: 69, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:06:37,900 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 15 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-20 11:07:08,287 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.311e+01 2.584e+01 2.900e+01 3.704e+01, threshold=5.168e+01, percent-clipped=0.0 2024-08-20 11:07:25,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4781480.0, ans=0.125 2024-08-20 11:07:47,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4781580.0, ans=0.05 2024-08-20 11:08:15,354 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 4000, loss[loss=0.09853, beats_loss=0.01215, ecapa_loss=0.0001366, whisper_loss=0.08501, over 22136.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01034, ecapa_loss=0.0001397, whisper_loss=0.09015, over 3818919.69 frames. ], batch size: 92, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:08:17,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4781780.0, ans=0.2 2024-08-20 11:08:27,546 INFO [train_multi_KD3.py:845] (0/4) A total of 49 cuts. 16 from LS+wenet, 18 from Vox, 15 fro AS 2024-08-20 11:08:36,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4781880.0, ans=0.125 2024-08-20 11:08:38,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4781880.0, ans=0.0 2024-08-20 11:09:09,963 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 27 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-20 11:09:16,239 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 16 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 11:09:23,190 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=20.86 vs. limit=22.5 2024-08-20 11:09:25,841 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 20 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-20 11:09:33,722 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.05 vs. limit=15.0 2024-08-20 11:10:02,840 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 21 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-20 11:10:07,426 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 4050, loss[loss=0.1014, beats_loss=0.01062, ecapa_loss=0.0001188, whisper_loss=0.08958, over 23267.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01029, ecapa_loss=0.0001396, whisper_loss=0.09053, over 3815928.98 frames. ], batch size: 89, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:10:50,227 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.323e+01 2.567e+01 2.848e+01 3.981e+01, threshold=5.134e+01, percent-clipped=0.0 2024-08-20 11:10:50,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4782380.0, ans=0.1 2024-08-20 11:10:52,814 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 25 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-20 11:11:33,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4782580.0, ans=0.025 2024-08-20 11:11:55,094 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=12.0 2024-08-20 11:12:09,986 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 4100, loss[loss=0.08152, beats_loss=0.01145, ecapa_loss=0.0001463, whisper_loss=0.06861, over 19546.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01024, ecapa_loss=0.0001405, whisper_loss=0.09129, over 3837351.95 frames. ], batch size: 84, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:12:18,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4782780.0, ans=0.0 2024-08-20 11:12:32,335 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.11 vs. limit=15.0 2024-08-20 11:12:37,975 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.22 vs. limit=15.0 2024-08-20 11:12:45,765 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 19 from LS+wenet, 9 from Vox, 33 fro AS 2024-08-20 11:12:53,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4782980.0, ans=0.125 2024-08-20 11:12:53,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=4782980.0, ans=0.1 2024-08-20 11:12:58,326 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 18 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-20 11:13:00,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4782980.0, ans=0.0 2024-08-20 11:13:01,868 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 34 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 11:13:41,068 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 4150, loss[loss=0.1426, beats_loss=0.007152, ecapa_loss=0.0001659, whisper_loss=0.1338, over 19600.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01025, ecapa_loss=0.0001407, whisper_loss=0.09197, over 3836896.36 frames. ], batch size: 77, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:13:48,804 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 30 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-20 11:14:01,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4783380.0, ans=0.09899494936611666 2024-08-20 11:14:11,325 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.979e+01 2.393e+01 2.670e+01 3.158e+01 1.265e+02, threshold=5.340e+01, percent-clipped=2.0 2024-08-20 11:14:14,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4783380.0, ans=0.5 2024-08-20 11:15:08,168 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 4200, loss[loss=0.1109, beats_loss=0.009687, ecapa_loss=0.0001695, whisper_loss=0.09956, over 20214.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01036, ecapa_loss=0.0001415, whisper_loss=0.09142, over 3849402.71 frames. ], batch size: 83, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:15:14,184 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 28 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-20 11:15:14,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4783780.0, ans=0.125 2024-08-20 11:15:24,804 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 23 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-20 11:15:34,013 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.97 vs. limit=15.0 2024-08-20 11:15:42,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4783980.0, ans=0.125 2024-08-20 11:15:43,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4783980.0, ans=0.125 2024-08-20 11:15:50,255 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 15 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-20 11:15:54,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4783980.0, ans=0.125 2024-08-20 11:16:06,381 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-20 11:16:06,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4784080.0, ans=0.125 2024-08-20 11:16:21,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4784180.0, ans=0.0 2024-08-20 11:16:37,532 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 4250, loss[loss=0.09071, beats_loss=0.009598, ecapa_loss=0.0001683, whisper_loss=0.07942, over 16833.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01038, ecapa_loss=0.0001415, whisper_loss=0.0911, over 3842097.00 frames. ], batch size: 70, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:17:04,508 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 31 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-20 11:17:07,734 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.233e+01 2.477e+01 2.798e+01 4.198e+01, threshold=4.955e+01, percent-clipped=0.0 2024-08-20 11:17:08,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4784380.0, ans=0.0 2024-08-20 11:17:11,635 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.35 vs. limit=10.0 2024-08-20 11:17:18,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4784480.0, ans=0.0 2024-08-20 11:17:39,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4784580.0, ans=0.04949747468305833 2024-08-20 11:18:05,736 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 4300, loss[loss=0.106, beats_loss=0.009648, ecapa_loss=0.0001237, whisper_loss=0.09512, over 21374.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01046, ecapa_loss=0.0001405, whisper_loss=0.08986, over 3843030.81 frames. ], batch size: 83, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:18:13,845 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 28 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-20 11:18:30,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4784880.0, ans=0.125 2024-08-20 11:18:33,982 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 20 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-20 11:18:39,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4784980.0, ans=0.125 2024-08-20 11:19:10,442 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-20 11:19:19,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4785180.0, ans=0.0 2024-08-20 11:19:31,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4785280.0, ans=0.125 2024-08-20 11:19:32,882 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 4350, loss[loss=0.1039, beats_loss=0.01163, ecapa_loss=0.000131, whisper_loss=0.09099, over 22921.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01051, ecapa_loss=0.0001394, whisper_loss=0.08967, over 3851018.91 frames. ], batch size: 90, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:19:48,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4785280.0, ans=0.125 2024-08-20 11:19:54,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4785380.0, ans=0.0 2024-08-20 11:20:02,423 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.221e+01 2.524e+01 2.843e+01 5.218e+01, threshold=5.048e+01, percent-clipped=1.0 2024-08-20 11:20:12,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4785480.0, ans=0.04949747468305833 2024-08-20 11:20:33,530 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 25 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-20 11:20:43,221 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.73 vs. limit=12.0 2024-08-20 11:20:53,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4785680.0, ans=0.125 2024-08-20 11:20:53,563 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.06 vs. limit=22.5 2024-08-20 11:21:01,237 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 4400, loss[loss=0.1038, beats_loss=0.01141, ecapa_loss=0.0001226, whisper_loss=0.09112, over 18782.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01048, ecapa_loss=0.0001392, whisper_loss=0.09011, over 3859898.73 frames. ], batch size: 73, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:21:28,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4785880.0, ans=0.0 2024-08-20 11:21:31,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4785880.0, ans=0.0 2024-08-20 11:22:08,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4786080.0, ans=0.1 2024-08-20 11:22:08,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4786080.0, ans=0.125 2024-08-20 11:22:24,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4786180.0, ans=10.0 2024-08-20 11:22:24,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4786180.0, ans=0.2 2024-08-20 11:22:30,824 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 4450, loss[loss=0.1001, beats_loss=0.01254, ecapa_loss=0.0001099, whisper_loss=0.08649, over 16022.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0104, ecapa_loss=0.0001401, whisper_loss=0.09008, over 3819417.74 frames. ], batch size: 61, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:22:35,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4786280.0, ans=10.0 2024-08-20 11:22:53,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4786380.0, ans=0.1 2024-08-20 11:23:00,231 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.276e+01 2.505e+01 2.862e+01 6.840e+01, threshold=5.011e+01, percent-clipped=2.0 2024-08-20 11:23:20,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4786480.0, ans=0.09899494936611666 2024-08-20 11:23:42,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4786680.0, ans=0.125 2024-08-20 11:23:48,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4786680.0, ans=0.0 2024-08-20 11:23:50,157 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.48 vs. limit=22.5 2024-08-20 11:23:57,870 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 4500, loss[loss=0.1018, beats_loss=0.009443, ecapa_loss=0.0001515, whisper_loss=0.09089, over 22018.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01034, ecapa_loss=0.0001415, whisper_loss=0.09068, over 3826154.16 frames. ], batch size: 92, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:23:58,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4786780.0, ans=0.0 2024-08-20 11:24:05,172 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 22 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-20 11:24:16,028 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 22 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-20 11:24:17,494 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 24 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 11:24:32,842 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 31 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-20 11:24:33,312 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.75 vs. limit=15.0 2024-08-20 11:24:42,372 WARNING [optim.py:496] (0/4) Scaling gradients by 0.06896068155765533, model_norm_threshold=50.106773376464844 2024-08-20 11:24:42,525 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.994e+04, grad_sumsq=4.994e+04, orig_rms_sq=1.000e+00 2024-08-20 11:24:50,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4787080.0, ans=0.0 2024-08-20 11:25:01,726 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 21 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-20 11:25:14,276 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-20 11:25:25,027 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 4550, loss[loss=0.1083, beats_loss=0.01018, ecapa_loss=0.0001454, whisper_loss=0.09671, over 19358.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0104, ecapa_loss=0.0001399, whisper_loss=0.09054, over 3824167.03 frames. ], batch size: 76, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:25:25,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4787280.0, ans=0.125 2024-08-20 11:25:56,150 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.635e+01 2.231e+01 2.496e+01 2.825e+01 7.266e+02, threshold=4.992e+01, percent-clipped=1.0 2024-08-20 11:26:04,828 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 19 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-20 11:26:08,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4787480.0, ans=0.0 2024-08-20 11:26:23,649 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 20 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-20 11:26:25,379 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 19 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-20 11:26:43,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4787680.0, ans=0.2 2024-08-20 11:26:45,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=4787680.0, ans=10.0 2024-08-20 11:26:55,702 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 4600, loss[loss=0.1033, beats_loss=0.01037, ecapa_loss=0.0001473, whisper_loss=0.09141, over 22750.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001403, whisper_loss=0.09033, over 3807848.38 frames. ], batch size: 91, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:26:56,059 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 27 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 11:26:58,007 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 20 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-20 11:27:10,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4787780.0, ans=0.125 2024-08-20 11:27:18,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4787880.0, ans=0.125 2024-08-20 11:27:38,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4787980.0, ans=0.125 2024-08-20 11:27:40,778 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 22 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-20 11:28:08,398 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.58 vs. limit=15.0 2024-08-20 11:28:23,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4788180.0, ans=0.0 2024-08-20 11:28:26,272 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 4650, loss[loss=0.1044, beats_loss=0.01134, ecapa_loss=0.0001255, whisper_loss=0.09184, over 19430.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0104, ecapa_loss=0.0001409, whisper_loss=0.08992, over 3819093.36 frames. ], batch size: 76, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:28:40,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4788280.0, ans=0.0 2024-08-20 11:28:41,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4788280.0, ans=0.0 2024-08-20 11:28:56,886 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.578e+01 2.344e+01 2.573e+01 2.789e+01 3.818e+01, threshold=5.145e+01, percent-clipped=0.0 2024-08-20 11:29:04,260 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 26 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-20 11:29:13,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4788480.0, ans=0.1 2024-08-20 11:29:16,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4788480.0, ans=0.125 2024-08-20 11:29:17,966 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 28 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-20 11:29:39,258 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 25 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 11:29:55,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4788780.0, ans=0.0 2024-08-20 11:29:56,537 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 4700, loss[loss=0.1166, beats_loss=0.007934, ecapa_loss=0.0001768, whisper_loss=0.1069, over 19474.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0104, ecapa_loss=0.0001406, whisper_loss=0.08991, over 3811636.87 frames. ], batch size: 79, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:30:09,469 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 23 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-20 11:30:24,544 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 11:30:30,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4788980.0, ans=0.2 2024-08-20 11:30:44,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4788980.0, ans=0.05 2024-08-20 11:31:01,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4789080.0, ans=0.125 2024-08-20 11:31:19,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4789180.0, ans=0.125 2024-08-20 11:31:22,459 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 4750, loss[loss=0.1219, beats_loss=0.01037, ecapa_loss=0.0001143, whisper_loss=0.1104, over 22949.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01046, ecapa_loss=0.0001397, whisper_loss=0.08972, over 3820993.97 frames. ], batch size: 89, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:31:32,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4789280.0, ans=0.125 2024-08-20 11:31:52,609 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.256e+01 2.490e+01 2.747e+01 3.725e+01, threshold=4.981e+01, percent-clipped=0.0 2024-08-20 11:31:59,943 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 35 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 11:32:31,174 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 14 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-20 11:32:32,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4789680.0, ans=0.1 2024-08-20 11:32:55,387 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 4800, loss[loss=0.08783, beats_loss=0.0119, ecapa_loss=0.0001191, whisper_loss=0.07474, over 21388.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01037, ecapa_loss=0.0001404, whisper_loss=0.09036, over 3803450.87 frames. ], batch size: 84, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:32:58,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4789780.0, ans=0.09899494936611666 2024-08-20 11:33:00,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4789780.0, ans=0.04949747468305833 2024-08-20 11:33:12,829 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.98 vs. limit=22.5 2024-08-20 11:33:23,644 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-20 11:33:25,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4789880.0, ans=0.125 2024-08-20 11:33:47,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4789980.0, ans=0.0 2024-08-20 11:33:48,926 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 14 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-20 11:33:50,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4789980.0, ans=0.125 2024-08-20 11:33:56,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4790080.0, ans=0.125 2024-08-20 11:34:08,741 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 15 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-20 11:34:09,318 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.38 vs. limit=22.5 2024-08-20 11:34:10,903 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 25 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-20 11:34:16,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4790180.0, ans=0.125 2024-08-20 11:34:42,513 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 4850, loss[loss=0.1282, beats_loss=0.007804, ecapa_loss=0.0001455, whisper_loss=0.119, over 22610.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01034, ecapa_loss=0.0001406, whisper_loss=0.09034, over 3794510.68 frames. ], batch size: 90, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:34:55,774 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 20 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-20 11:35:08,844 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 26 from LS+wenet, 14 from Vox, 53 fro AS 2024-08-20 11:35:27,413 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.728e+01 2.360e+01 2.620e+01 2.940e+01 4.009e+01, threshold=5.240e+01, percent-clipped=0.0 2024-08-20 11:35:28,154 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 19 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-20 11:36:36,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4790680.0, ans=0.2 2024-08-20 11:36:50,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4790680.0, ans=0.2 2024-08-20 11:36:57,846 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 4900, loss[loss=0.1021, beats_loss=0.01062, ecapa_loss=0.0001109, whisper_loss=0.09041, over 19898.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01045, ecapa_loss=0.0001396, whisper_loss=0.0899, over 3843019.20 frames. ], batch size: 75, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:36:58,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4790780.0, ans=0.0 2024-08-20 11:37:14,135 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.78 vs. limit=15.0 2024-08-20 11:37:47,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4790980.0, ans=0.125 2024-08-20 11:37:54,871 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 11:38:01,946 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 26 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-20 11:38:10,291 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 29 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-20 11:38:23,934 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 18 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-20 11:38:53,108 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 28 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-20 11:39:11,012 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 4950, loss[loss=0.1167, beats_loss=0.009132, ecapa_loss=0.0001274, whisper_loss=0.1063, over 21397.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01045, ecapa_loss=0.0001392, whisper_loss=0.08982, over 3860832.83 frames. ], batch size: 80, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:39:11,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4791280.0, ans=0.1 2024-08-20 11:39:19,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4791280.0, ans=0.125 2024-08-20 11:39:21,568 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 23 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 11:39:47,443 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 12 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-20 11:39:50,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4791380.0, ans=0.0 2024-08-20 11:39:54,448 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.340e+01 2.516e+01 2.750e+01 4.505e+01, threshold=5.032e+01, percent-clipped=0.0 2024-08-20 11:39:59,061 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 23 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-20 11:40:04,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4791480.0, ans=0.125 2024-08-20 11:40:45,196 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.22 vs. limit=15.0 2024-08-20 11:40:57,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4791680.0, ans=0.0 2024-08-20 11:41:16,302 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 5000, loss[loss=0.1037, beats_loss=0.009998, ecapa_loss=0.0001583, whisper_loss=0.09215, over 15996.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0105, ecapa_loss=0.0001401, whisper_loss=0.08931, over 3836724.29 frames. ], batch size: 65, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:41:24,109 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 31 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-20 11:41:29,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=4791780.0, ans=0.05 2024-08-20 11:41:37,115 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 11:41:51,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4791880.0, ans=0.125 2024-08-20 11:41:59,248 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.96 vs. limit=15.0 2024-08-20 11:42:38,183 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 22 from LS+wenet, 17 from Vox, 52 fro AS 2024-08-20 11:42:41,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4792080.0, ans=0.0 2024-08-20 11:43:10,563 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2024-08-20 11:43:18,650 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 5050, loss[loss=0.1043, beats_loss=0.007995, ecapa_loss=0.000176, whisper_loss=0.09455, over 17293.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01046, ecapa_loss=0.0001396, whisper_loss=0.08922, over 3812554.67 frames. ], batch size: 71, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:43:30,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4792280.0, ans=0.125 2024-08-20 11:43:37,694 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 16 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 11:43:59,122 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.312e+01 2.479e+01 2.896e+01 1.864e+02, threshold=4.958e+01, percent-clipped=1.0 2024-08-20 11:45:06,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4792680.0, ans=0.1 2024-08-20 11:45:16,444 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 5100, loss[loss=0.1051, beats_loss=0.009203, ecapa_loss=0.0001583, whisper_loss=0.09433, over 19494.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01045, ecapa_loss=0.0001398, whisper_loss=0.08928, over 3815348.20 frames. ], batch size: 77, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:45:40,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4792880.0, ans=0.1 2024-08-20 11:45:46,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4792880.0, ans=0.5 2024-08-20 11:46:14,657 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 20 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-20 11:46:42,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4793080.0, ans=0.1 2024-08-20 11:47:13,100 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.61 vs. limit=22.5 2024-08-20 11:47:19,417 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 5150, loss[loss=0.1298, beats_loss=0.006094, ecapa_loss=0.0001434, whisper_loss=0.1223, over 16908.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001402, whisper_loss=0.08957, over 3839086.18 frames. ], batch size: 64, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:47:43,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4793380.0, ans=0.07 2024-08-20 11:47:48,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4793380.0, ans=0.0 2024-08-20 11:47:50,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4793380.0, ans=0.1 2024-08-20 11:48:01,220 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.665e+01 2.291e+01 2.565e+01 2.823e+01 3.881e+01, threshold=5.130e+01, percent-clipped=0.0 2024-08-20 11:48:02,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4793380.0, ans=0.2 2024-08-20 11:48:10,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4793480.0, ans=0.125 2024-08-20 11:48:15,181 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-20 11:48:27,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4793480.0, ans=0.125 2024-08-20 11:48:37,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4793580.0, ans=0.125 2024-08-20 11:49:23,405 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 5200, loss[loss=0.1076, beats_loss=0.01146, ecapa_loss=0.0001318, whisper_loss=0.09482, over 23381.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01046, ecapa_loss=0.0001399, whisper_loss=0.08975, over 3863214.18 frames. ], batch size: 93, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:50:10,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4793880.0, ans=0.04949747468305833 2024-08-20 11:50:29,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4793980.0, ans=0.0 2024-08-20 11:51:27,620 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 5250, loss[loss=0.1065, beats_loss=0.01012, ecapa_loss=0.0001838, whisper_loss=0.09459, over 16465.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01043, ecapa_loss=0.0001398, whisper_loss=0.09041, over 3872414.38 frames. ], batch size: 67, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:51:28,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4794280.0, ans=0.1 2024-08-20 11:52:02,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4794380.0, ans=0.0 2024-08-20 11:52:11,027 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.300e+01 2.574e+01 2.899e+01 1.239e+02, threshold=5.148e+01, percent-clipped=2.0 2024-08-20 11:52:14,083 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 28 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-20 11:52:24,475 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 16 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 11:52:24,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4794480.0, ans=0.1 2024-08-20 11:52:27,109 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 22 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-20 11:53:08,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4794680.0, ans=0.0 2024-08-20 11:53:17,734 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 19 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-20 11:53:31,840 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 5300, loss[loss=0.1089, beats_loss=0.01016, ecapa_loss=0.0001588, whisper_loss=0.09717, over 19661.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01035, ecapa_loss=0.0001403, whisper_loss=0.09083, over 3830633.28 frames. ], batch size: 75, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:54:05,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4794880.0, ans=0.125 2024-08-20 11:54:12,815 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 28 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-20 11:54:53,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4795080.0, ans=0.125 2024-08-20 11:55:10,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4795180.0, ans=0.125 2024-08-20 11:55:22,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4795180.0, ans=0.125 2024-08-20 11:55:24,654 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 11:55:29,312 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 5350, loss[loss=0.1052, beats_loss=0.0103, ecapa_loss=0.0001416, whisper_loss=0.09351, over 21903.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01037, ecapa_loss=0.0001402, whisper_loss=0.09061, over 3829728.72 frames. ], batch size: 90, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:55:54,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4795380.0, ans=0.125 2024-08-20 11:56:10,461 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.312e+01 2.539e+01 2.746e+01 3.720e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-20 11:56:41,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=4795580.0, ans=15.0 2024-08-20 11:56:48,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4795580.0, ans=0.0 2024-08-20 11:57:32,238 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 5400, loss[loss=0.1085, beats_loss=0.009313, ecapa_loss=0.0001368, whisper_loss=0.09785, over 22902.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01031, ecapa_loss=0.0001396, whisper_loss=0.09076, over 3833986.00 frames. ], batch size: 90, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 11:58:01,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4795880.0, ans=0.0 2024-08-20 11:58:04,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4795880.0, ans=0.125 2024-08-20 11:58:04,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4795880.0, ans=0.125 2024-08-20 11:58:12,193 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 26 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-20 11:58:46,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4796080.0, ans=0.0 2024-08-20 11:59:07,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4796180.0, ans=0.2 2024-08-20 11:59:23,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4796180.0, ans=0.0 2024-08-20 11:59:27,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4796180.0, ans=0.125 2024-08-20 11:59:35,899 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 5450, loss[loss=0.09741, beats_loss=0.01003, ecapa_loss=0.0001432, whisper_loss=0.08595, over 22083.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01024, ecapa_loss=0.0001393, whisper_loss=0.09072, over 3810274.71 frames. ], batch size: 92, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 12:00:18,903 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.281e+01 2.461e+01 2.740e+01 4.925e+01, threshold=4.922e+01, percent-clipped=0.0 2024-08-20 12:00:51,531 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 19 from LS+wenet, 20 from Vox, 50 fro AS 2024-08-20 12:00:57,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4796580.0, ans=10.0 2024-08-20 12:00:59,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4796580.0, ans=0.09899494936611666 2024-08-20 12:01:21,019 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 24 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-20 12:01:33,823 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=15.0 2024-08-20 12:01:43,293 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 5500, loss[loss=0.09373, beats_loss=0.01239, ecapa_loss=0.0001381, whisper_loss=0.07996, over 16557.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01032, ecapa_loss=0.0001379, whisper_loss=0.0905, over 3807414.32 frames. ], batch size: 69, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 12:01:43,545 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 22 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-20 12:01:57,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4796780.0, ans=0.125 2024-08-20 12:02:17,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4796880.0, ans=0.125 2024-08-20 12:02:30,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4796980.0, ans=0.125 2024-08-20 12:03:06,105 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.35 vs. limit=10.0 2024-08-20 12:03:07,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4797080.0, ans=0.5 2024-08-20 12:03:40,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4797180.0, ans=0.0 2024-08-20 12:03:45,336 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 5550, loss[loss=0.07906, beats_loss=0.01052, ecapa_loss=0.0001455, whisper_loss=0.06708, over 15499.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01032, ecapa_loss=0.0001393, whisper_loss=0.09053, over 3798793.51 frames. ], batch size: 64, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 12:03:48,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4797280.0, ans=0.125 2024-08-20 12:04:04,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4797280.0, ans=0.125 2024-08-20 12:04:31,038 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.285e+01 2.426e+01 2.728e+01 7.340e+01, threshold=4.852e+01, percent-clipped=2.0 2024-08-20 12:04:49,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4797480.0, ans=0.0 2024-08-20 12:05:04,300 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 17 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 12:05:04,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4797580.0, ans=0.1 2024-08-20 12:05:21,780 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.80 vs. limit=10.0 2024-08-20 12:05:53,997 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 5600, loss[loss=0.1101, beats_loss=0.007618, ecapa_loss=0.000184, whisper_loss=0.1006, over 21068.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01033, ecapa_loss=0.0001398, whisper_loss=0.09075, over 3813169.23 frames. ], batch size: 89, lr: 1.86e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 12:06:03,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4797780.0, ans=0.125 2024-08-20 12:06:11,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4797780.0, ans=0.125 2024-08-20 12:06:11,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4797780.0, ans=0.125 2024-08-20 12:06:41,783 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.73 vs. limit=22.5 2024-08-20 12:07:46,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4798180.0, ans=0.125 2024-08-20 12:08:08,264 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 5650, loss[loss=0.1008, beats_loss=0.01006, ecapa_loss=0.0001244, whisper_loss=0.08954, over 17493.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01033, ecapa_loss=0.0001403, whisper_loss=0.09017, over 3786210.66 frames. ], batch size: 66, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:08:23,342 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.09 vs. limit=10.0 2024-08-20 12:08:32,469 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 33 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-20 12:08:51,777 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.277e+01 2.490e+01 2.736e+01 3.914e+01, threshold=4.980e+01, percent-clipped=0.0 2024-08-20 12:08:58,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4798480.0, ans=0.2 2024-08-20 12:09:05,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4798480.0, ans=0.0 2024-08-20 12:09:21,519 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 32 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-20 12:10:00,604 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 23 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-20 12:10:10,231 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 5700, loss[loss=0.1162, beats_loss=0.009046, ecapa_loss=0.000125, whisper_loss=0.1059, over 16626.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01031, ecapa_loss=0.0001394, whisper_loss=0.09062, over 3808887.70 frames. ], batch size: 62, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:11:29,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4799080.0, ans=0.0 2024-08-20 12:11:58,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4799180.0, ans=0.0 2024-08-20 12:12:00,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4799180.0, ans=0.125 2024-08-20 12:12:10,001 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 5750, loss[loss=0.1058, beats_loss=0.01217, ecapa_loss=0.0001099, whisper_loss=0.09258, over 23598.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01027, ecapa_loss=0.00014, whisper_loss=0.09086, over 3850339.00 frames. ], batch size: 93, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:12:13,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4799280.0, ans=0.1 2024-08-20 12:12:20,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4799280.0, ans=0.125 2024-08-20 12:12:27,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4799280.0, ans=0.0 2024-08-20 12:12:49,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=4799380.0, ans=0.2 2024-08-20 12:12:51,237 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.328e+01 2.562e+01 2.759e+01 3.925e+01, threshold=5.124e+01, percent-clipped=0.0 2024-08-20 12:13:26,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4799580.0, ans=0.125 2024-08-20 12:13:34,881 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=15.0 2024-08-20 12:13:44,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4799580.0, ans=0.125 2024-08-20 12:13:57,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4799680.0, ans=0.1 2024-08-20 12:14:13,757 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 5800, loss[loss=0.1112, beats_loss=0.01039, ecapa_loss=0.0001515, whisper_loss=0.09925, over 13999.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01026, ecapa_loss=0.0001415, whisper_loss=0.091, over 3848571.78 frames. ], batch size: 54, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:14:16,921 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 26 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-20 12:14:37,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4799880.0, ans=0.0 2024-08-20 12:15:07,370 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-480000.pt 2024-08-20 12:15:11,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4799980.0, ans=0.125 2024-08-20 12:15:18,796 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 36 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-20 12:15:21,860 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 21 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 12:15:28,707 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 22 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-20 12:15:37,127 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.14 vs. limit=22.5 2024-08-20 12:15:53,401 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 19 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-20 12:16:17,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4800280.0, ans=0.015 2024-08-20 12:16:18,532 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 5850, loss[loss=0.1155, beats_loss=0.009934, ecapa_loss=0.0001259, whisper_loss=0.1043, over 22469.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01032, ecapa_loss=0.0001413, whisper_loss=0.09109, over 3849770.81 frames. ], batch size: 89, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:16:49,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4800380.0, ans=0.2 2024-08-20 12:16:58,529 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.301e+01 2.506e+01 2.861e+01 6.399e+01, threshold=5.013e+01, percent-clipped=1.0 2024-08-20 12:17:39,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4800580.0, ans=0.125 2024-08-20 12:17:40,358 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.47 vs. limit=15.0 2024-08-20 12:18:07,261 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2024-08-20 12:18:16,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4800780.0, ans=0.125 2024-08-20 12:18:16,852 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 5900, loss[loss=0.09845, beats_loss=0.009549, ecapa_loss=0.0001575, whisper_loss=0.08732, over 14025.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01044, ecapa_loss=0.0001419, whisper_loss=0.09053, over 3824675.77 frames. ], batch size: 57, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:18:18,892 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.25 vs. limit=15.0 2024-08-20 12:18:32,646 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 12:19:04,436 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=6.10 vs. limit=12.0 2024-08-20 12:19:25,495 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 20 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 12:20:05,537 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.13 vs. limit=22.5 2024-08-20 12:20:11,699 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.03 vs. limit=15.0 2024-08-20 12:20:15,141 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 5950, loss[loss=0.08564, beats_loss=0.01275, ecapa_loss=0.0001267, whisper_loss=0.07162, over 21119.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01047, ecapa_loss=0.0001418, whisper_loss=0.09017, over 3830247.49 frames. ], batch size: 89, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:20:23,889 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 36 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 12:20:30,697 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 32 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-20 12:20:39,199 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 31 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 12:20:45,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4801380.0, ans=0.2 2024-08-20 12:20:55,762 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.723e+01 2.276e+01 2.504e+01 2.875e+01 3.990e+01, threshold=5.008e+01, percent-clipped=0.0 2024-08-20 12:20:59,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4801380.0, ans=0.125 2024-08-20 12:21:00,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4801480.0, ans=0.05 2024-08-20 12:21:20,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4801480.0, ans=0.125 2024-08-20 12:21:29,882 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.23 vs. limit=15.0 2024-08-20 12:21:34,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4801580.0, ans=0.0 2024-08-20 12:22:09,484 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 6000, loss[loss=0.101, beats_loss=0.01079, ecapa_loss=0.0001124, whisper_loss=0.08911, over 13544.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001412, whisper_loss=0.09035, over 3818162.18 frames. ], batch size: 51, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:22:09,485 INFO [train_multi_KD3.py:1140] (0/4) Computing validation loss 2024-08-20 12:22:45,743 INFO [train_multi_KD3.py:1150] (0/4) Epoch 33, validation on ASR_libri: loss=0.254, beats_loss=0, ecapa_loss=0.0005123, whisper_loss=0.2489, over 931116.00 frames. 2024-08-20 12:23:08,968 INFO [train_multi_KD3.py:1150] (0/4) Epoch 33, validation on SV_voxceleb1: loss=0.003913, beats_loss=0, ecapa_loss=0.0003913, whisper_loss=0, over 944235.00 frames. 2024-08-20 12:24:44,092 INFO [train_multi_KD3.py:1150] (0/4) Epoch 33, validation on AT_audioset: loss=0.02298, beats_loss=0.02298, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 12:24:44,095 INFO [train_multi_KD3.py:1156] (0/4) Maximum memory allocated so far is 32775MB 2024-08-20 12:25:03,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4801880.0, ans=0.0 2024-08-20 12:25:29,348 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 18 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-20 12:25:56,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4802080.0, ans=0.0 2024-08-20 12:26:05,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4802080.0, ans=0.125 2024-08-20 12:26:33,393 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 26 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 12:26:36,928 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 6050, loss[loss=0.1143, beats_loss=0.01038, ecapa_loss=0.0001426, whisper_loss=0.1025, over 22511.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01055, ecapa_loss=0.0001416, whisper_loss=0.08937, over 3777570.68 frames. ], batch size: 89, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:26:46,903 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 12:26:50,134 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 19 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-20 12:26:56,182 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 27 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 12:27:03,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4802380.0, ans=0.0 2024-08-20 12:27:03,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4802380.0, ans=0.125 2024-08-20 12:27:07,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4802380.0, ans=0.0 2024-08-20 12:27:17,767 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.729e+01 2.267e+01 2.529e+01 2.771e+01 4.356e+01, threshold=5.058e+01, percent-clipped=0.0 2024-08-20 12:27:41,756 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 12:28:10,557 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 25 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 12:28:29,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4802680.0, ans=0.125 2024-08-20 12:28:35,572 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 6100, loss[loss=0.09656, beats_loss=0.01355, ecapa_loss=0.000122, whisper_loss=0.08179, over 17705.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01057, ecapa_loss=0.0001397, whisper_loss=0.08936, over 3796224.24 frames. ], batch size: 70, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:28:50,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4802780.0, ans=0.125 2024-08-20 12:29:21,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4802980.0, ans=0.125 2024-08-20 12:30:06,642 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.50 vs. limit=15.0 2024-08-20 12:30:17,422 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 17 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-20 12:30:20,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4803180.0, ans=0.125 2024-08-20 12:30:29,496 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 6150, loss[loss=0.08194, beats_loss=0.01305, ecapa_loss=0.0001172, whisper_loss=0.06772, over 17474.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01061, ecapa_loss=0.0001376, whisper_loss=0.08964, over 3792763.00 frames. ], batch size: 71, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:30:43,716 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 34 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-20 12:31:07,006 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.293e+01 2.457e+01 2.787e+01 2.276e+02, threshold=4.913e+01, percent-clipped=4.0 2024-08-20 12:32:03,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4803680.0, ans=0.125 2024-08-20 12:32:26,737 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 6200, loss[loss=0.1051, beats_loss=0.01048, ecapa_loss=0.0001707, whisper_loss=0.09288, over 16725.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01054, ecapa_loss=0.0001393, whisper_loss=0.09024, over 3773583.79 frames. ], batch size: 71, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:32:45,117 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 35 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-20 12:32:50,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4803880.0, ans=0.125 2024-08-20 12:32:52,547 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.51 vs. limit=15.0 2024-08-20 12:33:47,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4804080.0, ans=0.1 2024-08-20 12:33:57,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4804180.0, ans=0.125 2024-08-20 12:34:07,395 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.59 vs. limit=15.0 2024-08-20 12:34:21,872 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 6250, loss[loss=0.08476, beats_loss=0.01273, ecapa_loss=0.0001106, whisper_loss=0.07092, over 23988.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01052, ecapa_loss=0.0001403, whisper_loss=0.08999, over 3774731.86 frames. ], batch size: 95, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:34:52,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4804380.0, ans=0.2 2024-08-20 12:34:52,978 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.42 vs. limit=12.0 2024-08-20 12:35:02,555 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.372e+01 2.624e+01 2.911e+01 4.545e+01, threshold=5.248e+01, percent-clipped=0.0 2024-08-20 12:35:10,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4804480.0, ans=0.0 2024-08-20 12:35:14,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4804480.0, ans=0.2 2024-08-20 12:35:38,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4804580.0, ans=0.125 2024-08-20 12:35:55,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4804680.0, ans=0.95 2024-08-20 12:36:17,589 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 6300, loss[loss=0.1057, beats_loss=0.009101, ecapa_loss=0.0001509, whisper_loss=0.09505, over 20175.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01048, ecapa_loss=0.0001396, whisper_loss=0.09006, over 3807708.77 frames. ], batch size: 81, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:36:26,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4804780.0, ans=0.125 2024-08-20 12:36:28,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4804780.0, ans=0.2 2024-08-20 12:36:44,174 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.35 vs. limit=12.0 2024-08-20 12:36:52,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4804880.0, ans=0.1 2024-08-20 12:37:22,235 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.27 vs. limit=15.0 2024-08-20 12:37:53,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4805180.0, ans=0.125 2024-08-20 12:38:13,155 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 6350, loss[loss=0.1003, beats_loss=0.01125, ecapa_loss=0.0001098, whisper_loss=0.08795, over 23055.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0104, ecapa_loss=0.0001404, whisper_loss=0.08977, over 3765777.71 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:38:28,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4805280.0, ans=0.0 2024-08-20 12:38:32,496 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 12:38:32,906 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2024-08-20 12:38:53,957 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.680e+01 2.243e+01 2.449e+01 2.846e+01 7.911e+01, threshold=4.899e+01, percent-clipped=2.0 2024-08-20 12:39:01,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4805480.0, ans=0.025 2024-08-20 12:39:26,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4805580.0, ans=0.125 2024-08-20 12:39:30,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4805580.0, ans=0.125 2024-08-20 12:39:50,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4805680.0, ans=0.125 2024-08-20 12:40:03,419 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 23 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-20 12:40:05,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4805680.0, ans=0.07 2024-08-20 12:40:07,760 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 24 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-20 12:40:15,141 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 6400, loss[loss=0.104, beats_loss=0.009542, ecapa_loss=0.0001711, whisper_loss=0.09271, over 21533.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0104, ecapa_loss=0.0001407, whisper_loss=0.08942, over 3793200.40 frames. ], batch size: 91, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:40:23,185 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.71 vs. limit=22.5 2024-08-20 12:40:25,214 WARNING [optim.py:496] (0/4) Scaling gradients by 0.015770763158798218, model_norm_threshold=48.98588180541992 2024-08-20 12:40:25,368 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.0.self_attn_weights.in_proj.bias with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.332e+06, grad_sumsq=1.479e+05, orig_rms_sq=9.003e+00 2024-08-20 12:40:36,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4805880.0, ans=0.2 2024-08-20 12:40:48,912 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.51 vs. limit=22.5 2024-08-20 12:41:04,592 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.28 vs. limit=12.0 2024-08-20 12:42:10,945 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 6450, loss[loss=0.1067, beats_loss=0.01287, ecapa_loss=0.0001151, whisper_loss=0.09273, over 19580.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01036, ecapa_loss=0.0001413, whisper_loss=0.0891, over 3780023.87 frames. ], batch size: 78, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:42:28,785 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 24 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-20 12:42:31,757 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 23 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 12:42:40,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=4806380.0, ans=10.0 2024-08-20 12:42:43,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4806380.0, ans=0.125 2024-08-20 12:42:58,439 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.273e+01 2.543e+01 2.928e+01 3.106e+03, threshold=5.086e+01, percent-clipped=1.0 2024-08-20 12:43:04,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4806480.0, ans=0.1 2024-08-20 12:43:15,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4806480.0, ans=0.125 2024-08-20 12:43:17,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4806480.0, ans=0.0 2024-08-20 12:43:18,439 WARNING [optim.py:496] (0/4) Scaling gradients by 0.07709788531064987, model_norm_threshold=50.86475372314453 2024-08-20 12:43:18,594 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.1.norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.952e+04, grad_sumsq=6.952e+04, orig_rms_sq=1.000e+00 2024-08-20 12:43:19,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4806480.0, ans=0.125 2024-08-20 12:43:40,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4806580.0, ans=0.05 2024-08-20 12:43:41,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4806580.0, ans=0.125 2024-08-20 12:44:04,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4806680.0, ans=0.125 2024-08-20 12:44:07,336 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 6500, loss[loss=0.111, beats_loss=0.01075, ecapa_loss=0.0001548, whisper_loss=0.09875, over 22459.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01035, ecapa_loss=0.0001418, whisper_loss=0.08851, over 3732821.02 frames. ], batch size: 93, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:44:25,432 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.45 vs. limit=15.0 2024-08-20 12:44:30,833 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 25 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-20 12:44:57,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4806980.0, ans=0.125 2024-08-20 12:45:04,335 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 12:45:33,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4807180.0, ans=0.125 2024-08-20 12:45:40,557 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 6550, loss[loss=0.1142, beats_loss=0.008235, ecapa_loss=0.0001609, whisper_loss=0.1044, over 19477.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01033, ecapa_loss=0.000142, whisper_loss=0.08893, over 3742877.59 frames. ], batch size: 78, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:45:54,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4807280.0, ans=0.0 2024-08-20 12:46:02,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4807380.0, ans=0.2 2024-08-20 12:46:10,079 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.595e+01 2.299e+01 2.491e+01 2.817e+01 6.597e+02, threshold=4.982e+01, percent-clipped=1.0 2024-08-20 12:46:26,572 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 23 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-20 12:46:34,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4807480.0, ans=0.025 2024-08-20 12:46:36,803 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 29 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-20 12:46:57,632 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=12.0 2024-08-20 12:47:15,418 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 18 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-20 12:47:32,741 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 6600, loss[loss=0.116, beats_loss=0.009521, ecapa_loss=0.0001561, whisper_loss=0.1049, over 22385.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01032, ecapa_loss=0.0001416, whisper_loss=0.09001, over 3798733.04 frames. ], batch size: 91, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:47:35,684 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 25 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-20 12:47:36,281 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.42 vs. limit=22.5 2024-08-20 12:47:46,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4807780.0, ans=0.125 2024-08-20 12:48:04,947 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 30 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-20 12:48:14,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=4807880.0, ans=15.0 2024-08-20 12:48:24,864 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 25 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 12:48:46,851 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 20 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-20 12:49:36,557 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 6650, loss[loss=0.08842, beats_loss=0.01029, ecapa_loss=0.0001613, whisper_loss=0.07651, over 19320.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01032, ecapa_loss=0.0001417, whisper_loss=0.09051, over 3830004.26 frames. ], batch size: 81, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:49:43,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4808280.0, ans=0.0 2024-08-20 12:50:12,159 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-08-20 12:50:15,746 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.590e+01 2.324e+01 2.596e+01 3.081e+01 4.862e+01, threshold=5.192e+01, percent-clipped=0.0 2024-08-20 12:50:30,153 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 29 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-20 12:50:32,650 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-20 12:50:32,974 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=15.0 2024-08-20 12:50:44,330 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 15 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-20 12:50:55,894 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 12:51:08,483 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 28 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-20 12:51:27,590 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.19 vs. limit=22.5 2024-08-20 12:51:27,802 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.36 vs. limit=15.0 2024-08-20 12:51:33,286 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 6700, loss[loss=0.1122, beats_loss=0.008926, ecapa_loss=0.0001366, whisper_loss=0.1019, over 14812.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01035, ecapa_loss=0.0001411, whisper_loss=0.09005, over 3833830.48 frames. ], batch size: 57, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:52:30,526 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-20 12:52:32,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4809080.0, ans=0.0 2024-08-20 12:52:37,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4809080.0, ans=0.125 2024-08-20 12:52:43,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4809080.0, ans=0.07 2024-08-20 12:52:44,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=4809080.0, ans=0.1 2024-08-20 12:53:05,830 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 6750, loss[loss=0.08559, beats_loss=0.01173, ecapa_loss=0.0001715, whisper_loss=0.07215, over 19601.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01025, ecapa_loss=0.0001419, whisper_loss=0.09065, over 3854108.94 frames. ], batch size: 88, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:53:21,515 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 12:53:22,094 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.45 vs. limit=15.0 2024-08-20 12:53:23,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4809380.0, ans=0.2 2024-08-20 12:53:26,671 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 24 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-20 12:53:30,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4809380.0, ans=0.125 2024-08-20 12:53:35,107 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.304e+01 2.494e+01 2.775e+01 4.602e+01, threshold=4.987e+01, percent-clipped=0.0 2024-08-20 12:53:45,867 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 14 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-20 12:54:06,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4809580.0, ans=0.0 2024-08-20 12:54:08,536 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 22 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 12:54:20,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4809680.0, ans=0.125 2024-08-20 12:54:28,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4809680.0, ans=0.1 2024-08-20 12:54:32,334 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 6800, loss[loss=0.08301, beats_loss=0.01105, ecapa_loss=0.0001591, whisper_loss=0.07037, over 21259.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01025, ecapa_loss=0.0001425, whisper_loss=0.08981, over 3827836.47 frames. ], batch size: 89, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:54:40,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4809780.0, ans=0.125 2024-08-20 12:54:41,446 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 26 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-20 12:54:41,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4809780.0, ans=0.0 2024-08-20 12:54:49,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4809880.0, ans=0.2 2024-08-20 12:54:49,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4809880.0, ans=0.125 2024-08-20 12:54:51,782 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 17 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-20 12:55:34,030 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 18 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-20 12:55:59,933 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 6850, loss[loss=0.1037, beats_loss=0.009911, ecapa_loss=0.0001459, whisper_loss=0.09233, over 14958.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01037, ecapa_loss=0.0001421, whisper_loss=0.08943, over 3822989.85 frames. ], batch size: 59, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:56:00,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4810280.0, ans=0.5 2024-08-20 12:56:01,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4810280.0, ans=0.125 2024-08-20 12:56:11,412 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.245e+00 2024-08-20 12:56:28,938 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.376e+01 2.516e+01 2.843e+01 1.582e+02, threshold=5.033e+01, percent-clipped=2.0 2024-08-20 12:56:29,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4810380.0, ans=0.1 2024-08-20 12:56:29,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4810380.0, ans=0.2 2024-08-20 12:56:40,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4810480.0, ans=0.0 2024-08-20 12:56:52,933 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.90 vs. limit=10.0 2024-08-20 12:56:54,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4810580.0, ans=0.09899494936611666 2024-08-20 12:57:09,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4810680.0, ans=0.2 2024-08-20 12:57:30,359 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 6900, loss[loss=0.1117, beats_loss=0.01055, ecapa_loss=0.0001448, whisper_loss=0.09968, over 21592.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01031, ecapa_loss=0.0001427, whisper_loss=0.09051, over 3833322.37 frames. ], batch size: 86, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:57:49,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4810880.0, ans=0.0 2024-08-20 12:58:03,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4810880.0, ans=0.0 2024-08-20 12:58:07,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4810980.0, ans=0.0 2024-08-20 12:58:26,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4811080.0, ans=0.125 2024-08-20 12:58:29,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4811080.0, ans=0.1 2024-08-20 12:58:36,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4811080.0, ans=0.125 2024-08-20 12:58:43,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4811180.0, ans=0.125 2024-08-20 12:58:59,267 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 6950, loss[loss=0.1013, beats_loss=0.00813, ecapa_loss=0.0001566, whisper_loss=0.09165, over 19684.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01033, ecapa_loss=0.0001413, whisper_loss=0.09063, over 3802558.93 frames. ], batch size: 77, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 12:59:01,359 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 12:59:30,814 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.235e+01 2.363e+01 2.831e+01 5.596e+01, threshold=4.726e+01, percent-clipped=1.0 2024-08-20 12:59:33,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4811380.0, ans=0.125 2024-08-20 12:59:55,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4811580.0, ans=0.04949747468305833 2024-08-20 12:59:57,320 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-20 13:00:02,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4811580.0, ans=0.07 2024-08-20 13:00:11,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4811680.0, ans=0.1 2024-08-20 13:00:29,824 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 7000, loss[loss=0.09213, beats_loss=0.01036, ecapa_loss=0.0001351, whisper_loss=0.08042, over 20965.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01034, ecapa_loss=0.0001407, whisper_loss=0.09042, over 3832782.20 frames. ], batch size: 84, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:01:20,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4811980.0, ans=0.125 2024-08-20 13:01:22,599 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 34 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-20 13:01:34,861 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 22 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 13:01:38,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4812080.0, ans=0.2 2024-08-20 13:01:40,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4812180.0, ans=0.125 2024-08-20 13:01:42,416 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 15 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-20 13:01:59,331 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 7050, loss[loss=0.1016, beats_loss=0.01274, ecapa_loss=8.076e-05, whisper_loss=0.08802, over 16226.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01036, ecapa_loss=0.0001408, whisper_loss=0.08986, over 3835389.51 frames. ], batch size: 59, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:02:31,562 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.241e+01 2.463e+01 2.779e+01 3.668e+01, threshold=4.925e+01, percent-clipped=0.0 2024-08-20 13:02:50,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4812480.0, ans=0.2 2024-08-20 13:03:31,196 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 7100, loss[loss=0.0981, beats_loss=0.01115, ecapa_loss=0.0001699, whisper_loss=0.08525, over 16977.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.0001389, whisper_loss=0.0904, over 3790057.36 frames. ], batch size: 71, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:04:12,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4812980.0, ans=0.125 2024-08-20 13:04:20,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4812980.0, ans=0.125 2024-08-20 13:04:32,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4813080.0, ans=0.125 2024-08-20 13:04:39,981 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 16 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-20 13:04:42,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4813080.0, ans=0.1 2024-08-20 13:05:03,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4813280.0, ans=0.1 2024-08-20 13:05:04,632 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 7150, loss[loss=0.1014, beats_loss=0.0105, ecapa_loss=0.0001441, whisper_loss=0.08949, over 21971.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01037, ecapa_loss=0.0001385, whisper_loss=0.09081, over 3771135.87 frames. ], batch size: 91, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:05:14,446 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 28 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-20 13:05:21,434 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2024-08-20 13:05:27,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4813380.0, ans=0.0 2024-08-20 13:05:35,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4813380.0, ans=0.125 2024-08-20 13:05:36,028 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.222e+01 2.476e+01 2.874e+01 3.378e+02, threshold=4.952e+01, percent-clipped=1.0 2024-08-20 13:05:51,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4813480.0, ans=0.125 2024-08-20 13:06:16,823 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 30 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-20 13:06:31,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4813680.0, ans=0.1 2024-08-20 13:06:33,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4813680.0, ans=0.125 2024-08-20 13:06:36,015 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 7200, loss[loss=0.1307, beats_loss=0.008219, ecapa_loss=0.0001439, whisper_loss=0.1211, over 21103.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0103, ecapa_loss=0.0001394, whisper_loss=0.09123, over 3817148.68 frames. ], batch size: 86, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:06:41,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4813780.0, ans=0.125 2024-08-20 13:06:43,637 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2024-08-20 13:06:52,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4813880.0, ans=0.125 2024-08-20 13:06:53,087 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.42 vs. limit=12.0 2024-08-20 13:07:07,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4813880.0, ans=0.1 2024-08-20 13:07:12,726 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.30 vs. limit=22.5 2024-08-20 13:07:38,243 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.26 vs. limit=10.0 2024-08-20 13:07:40,893 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 16 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-20 13:07:48,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4814080.0, ans=0.125 2024-08-20 13:07:56,320 WARNING [optim.py:496] (0/4) Scaling gradients by 0.09890901297330856, model_norm_threshold=49.522193908691406 2024-08-20 13:07:56,473 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.024e+04, grad_sumsq=9.178e+03, orig_rms_sq=3.294e+00 2024-08-20 13:08:08,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4814280.0, ans=0.125 2024-08-20 13:08:10,559 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 7250, loss[loss=0.1128, beats_loss=0.008255, ecapa_loss=0.0001627, whisper_loss=0.1029, over 17388.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0103, ecapa_loss=0.0001412, whisper_loss=0.0906, over 3771206.04 frames. ], batch size: 66, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:08:16,312 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.60 vs. limit=15.0 2024-08-20 13:08:17,221 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.64 vs. limit=15.0 2024-08-20 13:08:19,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4814280.0, ans=0.125 2024-08-20 13:08:36,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4814380.0, ans=0.05 2024-08-20 13:08:41,436 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.541e+01 2.256e+01 2.635e+01 2.926e+01 5.007e+02, threshold=5.270e+01, percent-clipped=2.0 2024-08-20 13:09:25,231 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 23 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-20 13:09:39,684 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 7300, loss[loss=0.0949, beats_loss=0.01101, ecapa_loss=0.0001633, whisper_loss=0.08225, over 16343.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01026, ecapa_loss=0.0001419, whisper_loss=0.09005, over 3763658.37 frames. ], batch size: 68, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:09:44,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4814780.0, ans=0.125 2024-08-20 13:10:18,112 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 28 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-20 13:10:34,976 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.49 vs. limit=15.0 2024-08-20 13:10:48,293 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 13:10:52,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4815080.0, ans=0.0 2024-08-20 13:11:17,552 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 7350, loss[loss=0.1108, beats_loss=0.008473, ecapa_loss=0.0001423, whisper_loss=0.1009, over 22756.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01029, ecapa_loss=0.0001415, whisper_loss=0.0897, over 3766648.11 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:11:21,248 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 18 from LS+wenet, 27 from Vox, 21 fro AS 2024-08-20 13:11:45,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4815380.0, ans=0.035 2024-08-20 13:11:51,312 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 16 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-20 13:11:52,796 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.308e+01 2.500e+01 2.769e+01 3.790e+01, threshold=5.001e+01, percent-clipped=0.0 2024-08-20 13:11:55,402 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 13:12:03,477 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 24 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-20 13:12:06,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4815480.0, ans=0.2 2024-08-20 13:12:22,466 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 26 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-20 13:12:46,119 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 21 from LS+wenet, 22 from Vox, 52 fro AS 2024-08-20 13:12:48,753 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2024-08-20 13:13:01,529 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 7400, loss[loss=0.1109, beats_loss=0.009965, ecapa_loss=0.0001469, whisper_loss=0.09947, over 24282.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01037, ecapa_loss=0.0001408, whisper_loss=0.08962, over 3793395.45 frames. ], batch size: 95, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:13:09,113 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 24 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-20 13:13:15,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4815780.0, ans=0.125 2024-08-20 13:14:09,613 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 31 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-20 13:14:10,036 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.17 vs. limit=10.0 2024-08-20 13:14:38,166 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 7450, loss[loss=0.108, beats_loss=0.009554, ecapa_loss=0.0001448, whisper_loss=0.09699, over 16864.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01035, ecapa_loss=0.0001403, whisper_loss=0.09037, over 3802480.21 frames. ], batch size: 67, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:14:47,747 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 24 from LS+wenet, 20 from Vox, 51 fro AS 2024-08-20 13:14:58,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4816380.0, ans=0.1 2024-08-20 13:15:11,842 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.691e+01 2.199e+01 2.442e+01 2.667e+01 5.088e+01, threshold=4.883e+01, percent-clipped=1.0 2024-08-20 13:15:21,599 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 28 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 13:15:26,052 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.07 vs. limit=15.0 2024-08-20 13:15:32,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=4816480.0, ans=6.0 2024-08-20 13:15:39,389 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 13:15:45,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4816580.0, ans=0.125 2024-08-20 13:16:03,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4816680.0, ans=0.125 2024-08-20 13:16:19,360 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 7500, loss[loss=0.1348, beats_loss=0.008626, ecapa_loss=0.0001219, whisper_loss=0.125, over 19767.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01035, ecapa_loss=0.0001406, whisper_loss=0.09036, over 3854675.38 frames. ], batch size: 72, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:16:19,582 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 25 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-20 13:16:23,907 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-20 13:16:33,231 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 26 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-20 13:16:41,669 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-20 13:16:54,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4816880.0, ans=0.125 2024-08-20 13:17:00,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4816980.0, ans=0.0 2024-08-20 13:17:24,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4817080.0, ans=0.1 2024-08-20 13:17:33,572 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 17 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-20 13:18:01,084 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 7550, loss[loss=0.1106, beats_loss=0.008394, ecapa_loss=0.00015, whisper_loss=0.1007, over 18077.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01044, ecapa_loss=0.0001396, whisper_loss=0.0901, over 3813843.90 frames. ], batch size: 72, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:18:08,124 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-20 13:18:16,092 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.34 vs. limit=22.5 2024-08-20 13:18:34,752 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.663e+01 2.221e+01 2.519e+01 2.793e+01 1.462e+02, threshold=5.038e+01, percent-clipped=2.0 2024-08-20 13:18:44,311 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.12 vs. limit=6.0 2024-08-20 13:18:53,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4817480.0, ans=0.1 2024-08-20 13:19:14,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4817580.0, ans=0.125 2024-08-20 13:19:18,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4817580.0, ans=0.2 2024-08-20 13:19:30,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4817680.0, ans=0.125 2024-08-20 13:19:34,800 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 25 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-20 13:19:42,578 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 7600, loss[loss=0.1104, beats_loss=0.00789, ecapa_loss=0.0001811, whisper_loss=0.1007, over 21074.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01035, ecapa_loss=0.0001409, whisper_loss=0.0907, over 3794394.72 frames. ], batch size: 85, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:19:42,853 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 21 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 13:19:46,931 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-20 13:19:51,283 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 25 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 13:19:53,837 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.42 vs. limit=15.0 2024-08-20 13:20:24,646 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.71 vs. limit=15.0 2024-08-20 13:20:41,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4818080.0, ans=0.125 2024-08-20 13:20:52,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4818080.0, ans=0.125 2024-08-20 13:21:12,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4818180.0, ans=0.125 2024-08-20 13:21:19,811 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 7650, loss[loss=0.1013, beats_loss=0.01197, ecapa_loss=0.000136, whisper_loss=0.08794, over 20663.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01045, ecapa_loss=0.0001394, whisper_loss=0.0901, over 3843168.74 frames. ], batch size: 84, lr: 1.85e-03, grad_scale: 1.152921504606847e+18 2024-08-20 13:21:41,060 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 20 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-20 13:21:48,131 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.39 vs. limit=15.0 2024-08-20 13:21:52,255 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.373e+01 2.628e+01 2.994e+01 5.178e+01, threshold=5.256e+01, percent-clipped=1.0 2024-08-20 13:21:57,045 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 13:21:58,495 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 11 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-20 13:22:57,172 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 7700, loss[loss=0.1273, beats_loss=0.006362, ecapa_loss=0.0001608, whisper_loss=0.1193, over 17602.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01044, ecapa_loss=0.0001394, whisper_loss=0.08992, over 3821059.35 frames. ], batch size: 68, lr: 1.85e-03, grad_scale: 1.152921504606847e+18 2024-08-20 13:23:23,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4818880.0, ans=0.1 2024-08-20 13:23:33,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4818980.0, ans=0.125 2024-08-20 13:23:43,364 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 13:24:01,807 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 19 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-20 13:24:24,285 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 15 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-20 13:24:24,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4819180.0, ans=0.0 2024-08-20 13:24:32,697 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.81 vs. limit=15.0 2024-08-20 13:24:39,014 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 7750, loss[loss=0.09902, beats_loss=0.01026, ecapa_loss=0.0001273, whisper_loss=0.08749, over 17188.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0104, ecapa_loss=0.0001404, whisper_loss=0.08967, over 3807139.63 frames. ], batch size: 68, lr: 1.85e-03, grad_scale: 1.152921504606847e+18 2024-08-20 13:24:55,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4819280.0, ans=0.2 2024-08-20 13:25:16,265 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.228e+01 2.412e+01 2.689e+01 6.555e+01, threshold=4.823e+01, percent-clipped=1.0 2024-08-20 13:25:18,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4819480.0, ans=0.125 2024-08-20 13:25:27,839 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.417e-03 2024-08-20 13:25:34,702 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-20 13:25:55,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4819580.0, ans=0.2 2024-08-20 13:26:04,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4819680.0, ans=0.1 2024-08-20 13:26:08,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4819680.0, ans=0.125 2024-08-20 13:26:11,987 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.16 vs. limit=15.0 2024-08-20 13:26:17,565 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 7800, loss[loss=0.1277, beats_loss=0.009136, ecapa_loss=0.0001222, whisper_loss=0.1174, over 15227.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01039, ecapa_loss=0.0001393, whisper_loss=0.08988, over 3799249.06 frames. ], batch size: 55, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:26:23,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4819780.0, ans=0.125 2024-08-20 13:26:25,385 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.37 vs. limit=15.0 2024-08-20 13:26:28,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4819780.0, ans=0.1 2024-08-20 13:26:28,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4819780.0, ans=0.0 2024-08-20 13:26:44,887 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.24 vs. limit=15.0 2024-08-20 13:26:46,265 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 19 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-20 13:26:56,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4819980.0, ans=0.2 2024-08-20 13:27:08,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4819980.0, ans=0.1 2024-08-20 13:27:14,163 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 13:27:14,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4819980.0, ans=0.125 2024-08-20 13:27:27,571 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.19 vs. limit=15.0 2024-08-20 13:27:56,145 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 21 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-20 13:27:58,222 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 7850, loss[loss=0.1026, beats_loss=0.01002, ecapa_loss=0.0001498, whisper_loss=0.09112, over 18575.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01037, ecapa_loss=0.0001397, whisper_loss=0.08938, over 3812432.54 frames. ], batch size: 71, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:28:15,957 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 24 from LS+wenet, 17 from Vox, 52 fro AS 2024-08-20 13:28:24,603 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 17 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-20 13:28:34,227 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.297e+01 2.493e+01 2.759e+01 4.989e+01, threshold=4.986e+01, percent-clipped=1.0 2024-08-20 13:28:41,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4820480.0, ans=0.125 2024-08-20 13:28:55,050 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.96 vs. limit=22.5 2024-08-20 13:29:03,508 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.38 vs. limit=15.0 2024-08-20 13:29:18,639 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.64 vs. limit=15.0 2024-08-20 13:29:24,774 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 20 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-20 13:29:40,366 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 7900, loss[loss=0.09692, beats_loss=0.01236, ecapa_loss=0.0001094, whisper_loss=0.08346, over 17967.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01043, ecapa_loss=0.00014, whisper_loss=0.08881, over 3813290.15 frames. ], batch size: 70, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:29:43,905 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 21 from LS+wenet, 7 from Vox, 39 fro AS 2024-08-20 13:29:46,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4820780.0, ans=0.0 2024-08-20 13:30:43,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4821080.0, ans=0.5 2024-08-20 13:30:47,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4821080.0, ans=0.0 2024-08-20 13:30:51,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4821080.0, ans=0.1 2024-08-20 13:31:19,931 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 7950, loss[loss=0.1122, beats_loss=0.007359, ecapa_loss=0.00017, whisper_loss=0.1032, over 19781.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01041, ecapa_loss=0.0001395, whisper_loss=0.08882, over 3827447.70 frames. ], batch size: 80, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:31:52,867 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 15 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-20 13:31:53,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4821380.0, ans=0.125 2024-08-20 13:31:56,047 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.297e+01 2.495e+01 2.761e+01 3.642e+01, threshold=4.990e+01, percent-clipped=0.0 2024-08-20 13:31:58,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4821480.0, ans=0.1 2024-08-20 13:32:14,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4821480.0, ans=0.0 2024-08-20 13:32:20,101 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.59 vs. limit=15.0 2024-08-20 13:32:23,716 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.89 vs. limit=15.0 2024-08-20 13:32:34,655 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 28 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-20 13:32:36,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4821680.0, ans=0.125 2024-08-20 13:32:55,133 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 8000, loss[loss=0.09879, beats_loss=0.009916, ecapa_loss=0.0001332, whisper_loss=0.08754, over 22065.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01037, ecapa_loss=0.0001404, whisper_loss=0.08933, over 3819338.39 frames. ], batch size: 88, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:33:30,279 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.85 vs. limit=8.0 2024-08-20 13:33:47,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4821980.0, ans=0.1 2024-08-20 13:34:07,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4822080.0, ans=0.0 2024-08-20 13:34:25,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4822180.0, ans=0.0 2024-08-20 13:34:29,367 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=15.0 2024-08-20 13:34:32,028 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 8050, loss[loss=0.08826, beats_loss=0.01137, ecapa_loss=0.000135, whisper_loss=0.07554, over 18005.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01046, ecapa_loss=0.0001407, whisper_loss=0.08865, over 3820794.94 frames. ], batch size: 73, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:34:40,944 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.59 vs. limit=15.0 2024-08-20 13:35:02,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4822380.0, ans=0.1 2024-08-20 13:35:06,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4822380.0, ans=0.125 2024-08-20 13:35:07,267 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.231e+01 2.467e+01 2.722e+01 2.720e+02, threshold=4.934e+01, percent-clipped=1.0 2024-08-20 13:35:11,132 WARNING [optim.py:496] (0/4) Scaling gradients by 0.020375000312924385, model_norm_threshold=49.342281341552734 2024-08-20 13:35:11,288 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.714e+05, grad_sumsq=7.714e+05, orig_rms_sq=1.000e+00 2024-08-20 13:35:37,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4822580.0, ans=0.0 2024-08-20 13:35:58,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4822680.0, ans=0.1 2024-08-20 13:36:02,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4822680.0, ans=0.0 2024-08-20 13:36:06,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4822680.0, ans=0.1 2024-08-20 13:36:10,106 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 8100, loss[loss=0.122, beats_loss=0.01216, ecapa_loss=0.0001237, whisper_loss=0.1086, over 23301.00 frames. ], tot_loss[loss=0.09994, beats_loss=0.01058, ecapa_loss=0.0001407, whisper_loss=0.08795, over 3796560.49 frames. ], batch size: 91, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:36:10,295 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 31 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-20 13:36:12,218 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-20 13:36:22,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4822780.0, ans=0.125 2024-08-20 13:36:39,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4822880.0, ans=0.125 2024-08-20 13:36:40,174 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2024-08-20 13:36:44,737 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-20 13:37:03,159 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 19 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-20 13:37:13,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4823080.0, ans=0.0 2024-08-20 13:37:13,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4823080.0, ans=0.0 2024-08-20 13:37:17,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4823080.0, ans=0.0 2024-08-20 13:37:33,479 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 30 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-20 13:37:36,966 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 22 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-20 13:37:50,172 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 8150, loss[loss=0.114, beats_loss=0.008427, ecapa_loss=0.0001746, whisper_loss=0.1038, over 20261.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01052, ecapa_loss=0.0001422, whisper_loss=0.08867, over 3819827.89 frames. ], batch size: 83, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:37:56,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4823280.0, ans=0.2 2024-08-20 13:38:06,624 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 16 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-20 13:38:10,757 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-20 13:38:16,559 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.627e+01 2024-08-20 13:38:23,616 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.223e+01 2.514e+01 2.822e+01 2.422e+03, threshold=5.028e+01, percent-clipped=2.0 2024-08-20 13:38:31,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4823480.0, ans=0.125 2024-08-20 13:38:34,760 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 32 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-20 13:38:49,227 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 15 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-20 13:38:56,212 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-20 13:39:14,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4823680.0, ans=0.1 2024-08-20 13:39:15,300 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.06 vs. limit=15.0 2024-08-20 13:39:16,459 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 33 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-20 13:39:16,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4823680.0, ans=0.125 2024-08-20 13:39:26,039 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 8200, loss[loss=0.08029, beats_loss=0.01138, ecapa_loss=0.0001239, whisper_loss=0.06767, over 12499.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01047, ecapa_loss=0.0001403, whisper_loss=0.08916, over 3800351.16 frames. ], batch size: 50, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:39:56,094 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 29 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-20 13:40:02,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4823880.0, ans=0.125 2024-08-20 13:40:05,601 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-20 13:40:12,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4823980.0, ans=0.125 2024-08-20 13:40:24,215 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 13:40:33,331 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.87 vs. limit=15.0 2024-08-20 13:40:34,755 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.27 vs. limit=12.0 2024-08-20 13:40:38,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4824080.0, ans=0.1 2024-08-20 13:40:44,447 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 19 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-20 13:41:05,390 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 8250, loss[loss=0.07246, beats_loss=0.01131, ecapa_loss=0.0001338, whisper_loss=0.05981, over 15621.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01051, ecapa_loss=0.0001392, whisper_loss=0.08868, over 3802467.53 frames. ], batch size: 60, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:41:09,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4824280.0, ans=0.125 2024-08-20 13:41:23,681 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 19 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-20 13:41:40,704 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.700e+01 2.279e+01 2.520e+01 2.852e+01 4.224e+01, threshold=5.040e+01, percent-clipped=0.0 2024-08-20 13:41:45,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4824480.0, ans=0.125 2024-08-20 13:41:53,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4824480.0, ans=0.125 2024-08-20 13:42:00,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4824480.0, ans=0.2 2024-08-20 13:42:07,833 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 24 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 13:42:12,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4824580.0, ans=0.125 2024-08-20 13:42:21,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4824580.0, ans=0.1 2024-08-20 13:42:44,653 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 8300, loss[loss=0.0786, beats_loss=0.01273, ecapa_loss=0.0001294, whisper_loss=0.06457, over 22100.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01052, ecapa_loss=0.0001389, whisper_loss=0.08885, over 3818795.29 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:42:46,372 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 13:42:46,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4824780.0, ans=0.125 2024-08-20 13:43:06,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4824880.0, ans=0.2 2024-08-20 13:43:14,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4824880.0, ans=0.2 2024-08-20 13:43:24,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4824980.0, ans=0.125 2024-08-20 13:43:36,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4824980.0, ans=0.0 2024-08-20 13:43:47,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4825080.0, ans=0.0 2024-08-20 13:44:23,239 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 8350, loss[loss=0.08551, beats_loss=0.01134, ecapa_loss=0.000111, whisper_loss=0.07307, over 16733.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01048, ecapa_loss=0.000139, whisper_loss=0.08932, over 3822758.36 frames. ], batch size: 66, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:44:30,420 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=15.0 2024-08-20 13:44:34,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4825280.0, ans=0.125 2024-08-20 13:44:42,509 WARNING [optim.py:496] (0/4) Scaling gradients by 0.025893952697515488, model_norm_threshold=50.39912796020508 2024-08-20 13:44:42,676 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.24, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.172e+05, grad_sumsq=8.560e+07, orig_rms_sq=1.071e-02 2024-08-20 13:44:45,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4825380.0, ans=0.125 2024-08-20 13:44:58,390 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.380e+01 2.657e+01 2.993e+01 1.946e+03, threshold=5.314e+01, percent-clipped=1.0 2024-08-20 13:45:06,543 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 34 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 13:45:06,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4825480.0, ans=0.125 2024-08-20 13:45:20,659 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 25 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-20 13:46:02,257 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 8400, loss[loss=0.1109, beats_loss=0.01162, ecapa_loss=0.0001247, whisper_loss=0.09804, over 17198.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01046, ecapa_loss=0.0001389, whisper_loss=0.08933, over 3810733.19 frames. ], batch size: 68, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:46:35,226 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 25 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-20 13:46:37,174 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-20 13:46:43,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4825980.0, ans=0.0 2024-08-20 13:46:48,338 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 26 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-20 13:47:00,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4826080.0, ans=0.0 2024-08-20 13:47:01,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4826080.0, ans=0.125 2024-08-20 13:47:12,085 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-20 13:47:16,201 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.17 vs. limit=10.0 2024-08-20 13:47:40,623 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 8450, loss[loss=0.1256, beats_loss=0.007744, ecapa_loss=0.0001738, whisper_loss=0.1161, over 22392.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01038, ecapa_loss=0.0001384, whisper_loss=0.08981, over 3790894.13 frames. ], batch size: 91, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:47:44,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4826280.0, ans=0.125 2024-08-20 13:47:59,565 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.65 vs. limit=15.0 2024-08-20 13:48:01,344 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 21 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 13:48:15,372 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.337e+01 2.560e+01 2.840e+01 5.771e+01, threshold=5.121e+01, percent-clipped=2.0 2024-08-20 13:48:25,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4826480.0, ans=0.125 2024-08-20 13:48:35,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4826480.0, ans=0.1 2024-08-20 13:48:47,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4826580.0, ans=0.125 2024-08-20 13:48:59,922 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 17 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-20 13:49:06,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4826680.0, ans=0.0 2024-08-20 13:49:10,699 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.20 vs. limit=22.5 2024-08-20 13:49:17,566 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 8500, loss[loss=0.1187, beats_loss=0.0093, ecapa_loss=0.000136, whisper_loss=0.108, over 22495.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01041, ecapa_loss=0.000139, whisper_loss=0.08978, over 3816938.04 frames. ], batch size: 87, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:49:38,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4826880.0, ans=0.1 2024-08-20 13:49:48,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4826880.0, ans=0.0 2024-08-20 13:49:56,786 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 25 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 13:50:15,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4827080.0, ans=0.125 2024-08-20 13:50:19,788 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 24 from LS+wenet, 28 from Vox, 20 fro AS 2024-08-20 13:50:29,085 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-08-20 13:50:36,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4827180.0, ans=0.5 2024-08-20 13:50:46,424 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 8550, loss[loss=0.1177, beats_loss=0.009025, ecapa_loss=0.0001075, whisper_loss=0.1076, over 18929.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01038, ecapa_loss=0.0001408, whisper_loss=0.08967, over 3798954.54 frames. ], batch size: 70, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:50:46,648 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 24 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-20 13:50:48,886 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 36 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-20 13:50:54,204 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 25 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-20 13:51:09,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4827380.0, ans=0.1 2024-08-20 13:51:20,097 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.327e+01 2.509e+01 2.689e+01 1.250e+02, threshold=5.019e+01, percent-clipped=1.0 2024-08-20 13:51:41,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4827580.0, ans=0.125 2024-08-20 13:51:50,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4827580.0, ans=0.2 2024-08-20 13:51:57,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4827580.0, ans=0.0 2024-08-20 13:52:05,100 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 18 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 13:52:10,711 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 15 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-20 13:52:17,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4827780.0, ans=0.125 2024-08-20 13:52:18,429 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 8600, loss[loss=0.08464, beats_loss=0.0125, ecapa_loss=0.0001066, whisper_loss=0.07108, over 17795.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01034, ecapa_loss=0.0001402, whisper_loss=0.09003, over 3820178.26 frames. ], batch size: 70, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:52:18,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4827780.0, ans=0.125 2024-08-20 13:52:25,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4827780.0, ans=0.125 2024-08-20 13:52:25,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4827780.0, ans=0.125 2024-08-20 13:52:33,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2024-08-20 13:53:32,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4828180.0, ans=0.0 2024-08-20 13:53:47,616 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 8650, loss[loss=0.1202, beats_loss=0.008453, ecapa_loss=0.0001727, whisper_loss=0.11, over 21752.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01028, ecapa_loss=0.0001419, whisper_loss=0.08985, over 3814602.50 frames. ], batch size: 89, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:53:58,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4828280.0, ans=0.0 2024-08-20 13:53:59,569 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 27 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-20 13:54:21,941 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.355e+01 2.590e+01 2.852e+01 2.640e+02, threshold=5.179e+01, percent-clipped=3.0 2024-08-20 13:54:27,759 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 28 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-20 13:54:28,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4828480.0, ans=0.0 2024-08-20 13:55:05,477 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 25 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-20 13:55:10,989 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 34 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 13:55:15,334 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.36 vs. limit=22.5 2024-08-20 13:55:21,892 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 8700, loss[loss=0.0867, beats_loss=0.009909, ecapa_loss=0.0001209, whisper_loss=0.07558, over 16206.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01032, ecapa_loss=0.0001424, whisper_loss=0.08971, over 3811962.47 frames. ], batch size: 62, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:55:31,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4828780.0, ans=0.125 2024-08-20 13:55:40,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4828880.0, ans=0.2 2024-08-20 13:56:09,964 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 13:56:13,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4828980.0, ans=0.125 2024-08-20 13:56:20,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4829080.0, ans=0.1 2024-08-20 13:56:30,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4829080.0, ans=0.1 2024-08-20 13:56:34,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4829180.0, ans=0.125 2024-08-20 13:56:36,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4829180.0, ans=0.1 2024-08-20 13:56:53,947 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 8750, loss[loss=0.1102, beats_loss=0.0108, ecapa_loss=0.0001461, whisper_loss=0.0979, over 16510.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0103, ecapa_loss=0.0001415, whisper_loss=0.09001, over 3781344.93 frames. ], batch size: 67, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:56:57,998 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 18 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 13:57:14,575 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.95 vs. limit=12.0 2024-08-20 13:57:29,055 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.350e+01 2.524e+01 2.778e+01 9.782e+01, threshold=5.048e+01, percent-clipped=1.0 2024-08-20 13:57:42,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4829480.0, ans=0.125 2024-08-20 13:57:44,211 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.31 vs. limit=12.0 2024-08-20 13:58:12,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4829680.0, ans=0.1 2024-08-20 13:58:14,287 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 15 from LS+wenet, 28 from Vox, 19 fro AS 2024-08-20 13:58:21,476 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.41 vs. limit=15.0 2024-08-20 13:58:24,237 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 8800, loss[loss=0.105, beats_loss=0.007993, ecapa_loss=0.0001897, whisper_loss=0.09512, over 14181.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01035, ecapa_loss=0.0001415, whisper_loss=0.0897, over 3767241.04 frames. ], batch size: 60, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 13:58:33,813 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 16 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-20 13:58:53,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4829880.0, ans=0.0 2024-08-20 13:58:53,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4829880.0, ans=0.2 2024-08-20 13:59:31,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4830080.0, ans=0.125 2024-08-20 13:59:54,782 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-20 13:59:55,618 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 8850, loss[loss=0.1034, beats_loss=0.01022, ecapa_loss=0.0001227, whisper_loss=0.09191, over 22835.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01038, ecapa_loss=0.0001405, whisper_loss=0.08967, over 3812114.19 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:00:02,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4830280.0, ans=0.125 2024-08-20 14:00:08,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4830280.0, ans=0.0 2024-08-20 14:00:18,669 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.30 vs. limit=22.5 2024-08-20 14:00:29,657 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 35 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-20 14:00:30,719 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.235e+01 2.490e+01 2.757e+01 4.655e+01, threshold=4.979e+01, percent-clipped=0.0 2024-08-20 14:00:35,377 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-20 14:00:37,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4830480.0, ans=0.125 2024-08-20 14:00:49,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=4830480.0, ans=0.2 2024-08-20 14:00:58,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4830580.0, ans=0.125 2024-08-20 14:01:17,459 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2024-08-20 14:01:29,249 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 8900, loss[loss=0.105, beats_loss=0.01027, ecapa_loss=0.0001093, whisper_loss=0.09365, over 21814.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01041, ecapa_loss=0.0001398, whisper_loss=0.09001, over 3863518.53 frames. ], batch size: 83, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:01:33,878 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 24 from LS+wenet, 10 from Vox, 35 fro AS 2024-08-20 14:01:37,314 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 15 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 14:01:46,038 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 20 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-20 14:01:51,779 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 29 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-20 14:02:05,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4830980.0, ans=0.125 2024-08-20 14:02:23,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4831080.0, ans=0.125 2024-08-20 14:02:59,818 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 8950, loss[loss=0.08858, beats_loss=0.01275, ecapa_loss=0.0001319, whisper_loss=0.07451, over 19574.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0105, ecapa_loss=0.0001397, whisper_loss=0.08935, over 3855240.52 frames. ], batch size: 80, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:03:28,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4831380.0, ans=0.125 2024-08-20 14:03:30,485 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.318e+01 2.492e+01 2.834e+01 3.721e+01, threshold=4.984e+01, percent-clipped=0.0 2024-08-20 14:03:34,766 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 23 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 14:04:05,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4831580.0, ans=0.125 2024-08-20 14:04:18,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4831680.0, ans=0.125 2024-08-20 14:04:26,523 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 9000, loss[loss=0.1058, beats_loss=0.008905, ecapa_loss=0.0001663, whisper_loss=0.09524, over 18616.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01041, ecapa_loss=0.0001398, whisper_loss=0.08968, over 3840025.73 frames. ], batch size: 77, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:04:26,525 INFO [train_multi_KD3.py:1140] (0/4) Computing validation loss 2024-08-20 14:05:10,549 INFO [train_multi_KD3.py:1150] (0/4) Epoch 33, validation on ASR_libri: loss=0.2544, beats_loss=0, ecapa_loss=0.0005032, whisper_loss=0.2493, over 931116.00 frames. 2024-08-20 14:05:34,784 INFO [train_multi_KD3.py:1150] (0/4) Epoch 33, validation on SV_voxceleb1: loss=0.003984, beats_loss=0, ecapa_loss=0.0003984, whisper_loss=0, over 944235.00 frames. 2024-08-20 14:07:37,296 INFO [train_multi_KD3.py:1150] (0/4) Epoch 33, validation on AT_audioset: loss=0.02297, beats_loss=0.02297, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 14:07:37,300 INFO [train_multi_KD3.py:1156] (0/4) Maximum memory allocated so far is 32775MB 2024-08-20 14:07:37,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4831780.0, ans=0.125 2024-08-20 14:07:41,974 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 20 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-20 14:07:49,133 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 23 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-20 14:07:56,314 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 23 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-20 14:08:08,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4831880.0, ans=0.125 2024-08-20 14:08:26,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4832080.0, ans=0.2 2024-08-20 14:08:36,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4832080.0, ans=0.125 2024-08-20 14:08:44,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4832180.0, ans=0.1 2024-08-20 14:08:49,824 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.91 vs. limit=12.0 2024-08-20 14:08:51,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4832180.0, ans=0.0 2024-08-20 14:09:00,243 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 9050, loss[loss=0.09312, beats_loss=0.01032, ecapa_loss=0.0001558, whisper_loss=0.08124, over 16687.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01048, ecapa_loss=0.0001391, whisper_loss=0.08956, over 3821890.25 frames. ], batch size: 69, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:09:05,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4832280.0, ans=0.0 2024-08-20 14:09:13,580 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 38 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-20 14:09:18,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4832380.0, ans=0.125 2024-08-20 14:09:29,417 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.212e+01 2.355e+01 2.625e+01 3.620e+01, threshold=4.711e+01, percent-clipped=0.0 2024-08-20 14:10:02,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4832580.0, ans=0.0 2024-08-20 14:10:24,905 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 9100, loss[loss=0.08764, beats_loss=0.01036, ecapa_loss=0.0001735, whisper_loss=0.07555, over 21821.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001389, whisper_loss=0.08965, over 3826229.40 frames. ], batch size: 93, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:10:28,148 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 20 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-20 14:10:44,847 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.64 vs. limit=15.0 2024-08-20 14:10:58,875 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2024-08-20 14:11:12,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4832980.0, ans=0.2 2024-08-20 14:11:22,021 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.33 vs. limit=22.5 2024-08-20 14:11:35,782 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 14:11:39,828 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.00 vs. limit=15.0 2024-08-20 14:11:44,874 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 20 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 14:11:52,707 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 9150, loss[loss=0.09224, beats_loss=0.0124, ecapa_loss=0.0001308, whisper_loss=0.07853, over 15534.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0106, ecapa_loss=0.0001379, whisper_loss=0.08945, over 3860438.00 frames. ], batch size: 61, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:11:54,413 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 16 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-20 14:11:54,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4833280.0, ans=0.125 2024-08-20 14:12:04,285 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 30 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-20 14:12:10,492 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 14:12:17,621 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 36 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 14:12:22,570 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.265e+01 2.494e+01 2.821e+01 1.323e+02, threshold=4.988e+01, percent-clipped=2.0 2024-08-20 14:12:25,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4833480.0, ans=0.125 2024-08-20 14:12:25,901 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2024-08-20 14:12:33,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4833480.0, ans=0.1 2024-08-20 14:12:36,602 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 25 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-20 14:13:19,902 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 9200, loss[loss=0.09917, beats_loss=0.01298, ecapa_loss=9.444e-05, whisper_loss=0.08525, over 17447.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0106, ecapa_loss=0.0001384, whisper_loss=0.08978, over 3852008.03 frames. ], batch size: 69, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:13:29,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4833780.0, ans=0.2 2024-08-20 14:13:37,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4833880.0, ans=0.125 2024-08-20 14:13:56,034 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 22 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-20 14:13:58,174 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 28 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 14:13:58,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4833980.0, ans=0.125 2024-08-20 14:14:19,196 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 33 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-20 14:14:24,166 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=12.0 2024-08-20 14:14:28,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4834180.0, ans=0.125 2024-08-20 14:14:33,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4834180.0, ans=0.0 2024-08-20 14:14:39,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4834180.0, ans=0.0 2024-08-20 14:14:46,039 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 9250, loss[loss=0.09111, beats_loss=0.008103, ecapa_loss=0.0001212, whisper_loss=0.08179, over 14365.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01047, ecapa_loss=0.00014, whisper_loss=0.09072, over 3833237.44 frames. ], batch size: 53, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:14:51,997 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.08 vs. limit=15.0 2024-08-20 14:14:54,170 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.33 vs. limit=22.5 2024-08-20 14:14:59,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4834280.0, ans=0.125 2024-08-20 14:15:10,633 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 23 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-20 14:15:14,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4834380.0, ans=0.2 2024-08-20 14:15:15,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4834380.0, ans=0.09899494936611666 2024-08-20 14:15:16,769 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.272e+01 2.596e+01 3.076e+01 4.662e+01, threshold=5.191e+01, percent-clipped=0.0 2024-08-20 14:15:19,240 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-20 14:15:34,716 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 28 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-20 14:15:54,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4834680.0, ans=0.1 2024-08-20 14:16:01,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4834680.0, ans=0.125 2024-08-20 14:16:05,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4834680.0, ans=0.125 2024-08-20 14:16:06,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4834680.0, ans=0.125 2024-08-20 14:16:13,253 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 9300, loss[loss=0.1262, beats_loss=0.008556, ecapa_loss=0.0001521, whisper_loss=0.1162, over 22983.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01051, ecapa_loss=0.0001392, whisper_loss=0.09062, over 3873966.67 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:16:34,015 INFO [train_multi_KD3.py:845] (0/4) A total of 96 cuts. 32 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-20 14:16:42,809 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 14:16:52,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4834980.0, ans=0.0 2024-08-20 14:16:53,750 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 17 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-20 14:16:54,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4834980.0, ans=0.1 2024-08-20 14:16:59,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4834980.0, ans=0.1 2024-08-20 14:17:44,331 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 9350, loss[loss=0.1022, beats_loss=0.01158, ecapa_loss=0.0001621, whisper_loss=0.08904, over 22324.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01053, ecapa_loss=0.0001399, whisper_loss=0.0899, over 3874054.28 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:17:47,225 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 30 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-20 14:17:49,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4835280.0, ans=0.125 2024-08-20 14:18:10,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4835380.0, ans=0.09899494936611666 2024-08-20 14:18:17,253 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.243e+01 2.510e+01 2.725e+01 8.699e+01, threshold=5.020e+01, percent-clipped=1.0 2024-08-20 14:18:23,301 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 19 from LS+wenet, 13 from Vox, 18 fro AS 2024-08-20 14:18:30,217 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 20 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-20 14:18:33,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4835480.0, ans=0.125 2024-08-20 14:18:35,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=4835480.0, ans=0.5 2024-08-20 14:18:42,250 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 23 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-20 14:18:47,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4835580.0, ans=0.125 2024-08-20 14:19:16,739 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 9400, loss[loss=0.07361, beats_loss=0.01463, ecapa_loss=9.692e-05, whisper_loss=0.05801, over 22683.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01051, ecapa_loss=0.0001391, whisper_loss=0.08915, over 3851635.54 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:19:27,566 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.57 vs. limit=22.5 2024-08-20 14:19:32,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=4835780.0, ans=0.025 2024-08-20 14:19:56,494 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 23 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-20 14:20:08,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4835980.0, ans=0.125 2024-08-20 14:20:30,426 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.72 vs. limit=22.5 2024-08-20 14:20:34,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4836180.0, ans=0.125 2024-08-20 14:20:47,415 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 9450, loss[loss=0.1183, beats_loss=0.008259, ecapa_loss=0.0001777, whisper_loss=0.1083, over 21987.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01043, ecapa_loss=0.0001397, whisper_loss=0.08926, over 3845079.80 frames. ], batch size: 91, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:21:05,185 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.07 vs. limit=15.0 2024-08-20 14:21:20,691 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.301e+01 2.582e+01 2.889e+01 4.439e+01, threshold=5.164e+01, percent-clipped=0.0 2024-08-20 14:21:21,814 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.64 vs. limit=22.5 2024-08-20 14:21:22,997 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-20 14:21:24,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4836480.0, ans=0.125 2024-08-20 14:21:32,321 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.86 vs. limit=15.0 2024-08-20 14:21:46,713 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 22 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-20 14:22:01,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4836680.0, ans=0.125 2024-08-20 14:22:21,457 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 9500, loss[loss=0.1064, beats_loss=0.0107, ecapa_loss=0.0001286, whisper_loss=0.09446, over 15487.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01037, ecapa_loss=0.0001404, whisper_loss=0.08924, over 3848172.56 frames. ], batch size: 60, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:22:32,517 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 21 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-20 14:22:49,653 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 30 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-20 14:22:51,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4836880.0, ans=0.2 2024-08-20 14:23:04,884 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 13 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-20 14:23:28,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4837080.0, ans=0.09899494936611666 2024-08-20 14:23:49,517 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 9550, loss[loss=0.1211, beats_loss=0.009655, ecapa_loss=0.000146, whisper_loss=0.11, over 22502.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01044, ecapa_loss=0.0001398, whisper_loss=0.08902, over 3802384.84 frames. ], batch size: 90, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:24:00,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4837280.0, ans=0.0 2024-08-20 14:24:02,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2024-08-20 14:24:09,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4837380.0, ans=0.09899494936611666 2024-08-20 14:24:18,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4837380.0, ans=0.125 2024-08-20 14:24:21,269 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.249e+01 2.469e+01 2.805e+01 3.890e+01, threshold=4.937e+01, percent-clipped=0.0 2024-08-20 14:24:21,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4837380.0, ans=0.0 2024-08-20 14:24:27,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4837480.0, ans=0.125 2024-08-20 14:24:46,977 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 16 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-20 14:25:02,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4837680.0, ans=0.125 2024-08-20 14:25:04,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4837680.0, ans=0.04949747468305833 2024-08-20 14:25:19,322 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 9600, loss[loss=0.1027, beats_loss=0.01113, ecapa_loss=0.0001205, whisper_loss=0.09032, over 23091.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0104, ecapa_loss=0.0001398, whisper_loss=0.09005, over 3810474.53 frames. ], batch size: 92, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:25:27,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4837780.0, ans=0.0 2024-08-20 14:25:59,331 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.90 vs. limit=15.0 2024-08-20 14:26:04,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4837980.0, ans=0.125 2024-08-20 14:26:22,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4838080.0, ans=0.2 2024-08-20 14:26:24,707 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.13 vs. limit=22.5 2024-08-20 14:26:32,259 INFO [train_multi_KD3.py:845] (0/4) A total of 96 cuts. 21 from LS+wenet, 30 from Vox, 45 fro AS 2024-08-20 14:26:39,628 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-20 14:26:48,009 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 9650, loss[loss=0.116, beats_loss=0.007692, ecapa_loss=0.0001411, whisper_loss=0.1069, over 16270.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01038, ecapa_loss=0.0001396, whisper_loss=0.08979, over 3807340.68 frames. ], batch size: 61, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:26:52,603 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2024-08-20 14:26:54,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4838280.0, ans=0.125 2024-08-20 14:27:16,173 INFO [train_multi_KD3.py:845] (0/4) A total of 49 cuts. 14 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-20 14:27:20,405 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.272e+01 2.458e+01 2.754e+01 3.251e+01, threshold=4.916e+01, percent-clipped=0.0 2024-08-20 14:27:32,381 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 26 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-20 14:28:00,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4838580.0, ans=0.125 2024-08-20 14:28:15,453 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.35 vs. limit=15.0 2024-08-20 14:28:18,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4838680.0, ans=0.05 2024-08-20 14:28:21,302 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 9700, loss[loss=0.105, beats_loss=0.01037, ecapa_loss=0.0001436, whisper_loss=0.09315, over 17392.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01037, ecapa_loss=0.0001391, whisper_loss=0.08964, over 3812931.44 frames. ], batch size: 72, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:29:08,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4838980.0, ans=0.0 2024-08-20 14:29:24,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4839080.0, ans=0.125 2024-08-20 14:29:30,676 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.63 vs. limit=15.0 2024-08-20 14:29:31,891 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 14:29:55,251 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 9750, loss[loss=0.09911, beats_loss=0.0122, ecapa_loss=0.0001496, whisper_loss=0.08542, over 21533.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01049, ecapa_loss=0.0001388, whisper_loss=0.08891, over 3814149.60 frames. ], batch size: 89, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:29:55,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4839280.0, ans=0.125 2024-08-20 14:30:01,846 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-20 14:30:04,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4839280.0, ans=0.0 2024-08-20 14:30:30,211 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.339e+01 2.577e+01 2.928e+01 5.580e+01, threshold=5.154e+01, percent-clipped=1.0 2024-08-20 14:30:32,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4839480.0, ans=0.125 2024-08-20 14:30:44,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.21 vs. limit=15.0 2024-08-20 14:30:54,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4839580.0, ans=0.0 2024-08-20 14:30:56,355 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 20 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-20 14:30:58,205 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 17 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-20 14:30:59,557 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 20 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-20 14:31:14,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4839680.0, ans=0.015 2024-08-20 14:31:26,404 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.98 vs. limit=15.0 2024-08-20 14:31:28,880 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 9800, loss[loss=0.09516, beats_loss=0.008985, ecapa_loss=0.0001591, whisper_loss=0.08458, over 15442.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01033, ecapa_loss=0.0001392, whisper_loss=0.08956, over 3786683.89 frames. ], batch size: 60, lr: 1.85e-03, grad_scale: 1.152921504606847e+18 2024-08-20 14:32:06,996 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.26 vs. limit=15.0 2024-08-20 14:32:09,881 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-484000.pt 2024-08-20 14:32:25,458 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 35 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 14:32:57,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4840180.0, ans=0.125 2024-08-20 14:33:00,195 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.02 vs. limit=10.0 2024-08-20 14:33:03,758 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 15 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-20 14:33:06,946 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 9850, loss[loss=0.1023, beats_loss=0.01188, ecapa_loss=0.0001026, whisper_loss=0.08943, over 23475.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01036, ecapa_loss=0.000139, whisper_loss=0.08948, over 3792799.55 frames. ], batch size: 94, lr: 1.85e-03, grad_scale: 1.152921504606847e+18 2024-08-20 14:33:11,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4840280.0, ans=0.0 2024-08-20 14:33:42,856 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.298e+01 2.480e+01 2.698e+01 3.610e+01, threshold=4.961e+01, percent-clipped=0.0 2024-08-20 14:34:11,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4840580.0, ans=0.0 2024-08-20 14:34:15,236 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.16 vs. limit=12.0 2024-08-20 14:34:30,534 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 24 from LS+wenet, 9 from Vox, 23 fro AS 2024-08-20 14:34:36,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4840680.0, ans=0.1 2024-08-20 14:34:45,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4840780.0, ans=0.0 2024-08-20 14:34:46,793 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 9900, loss[loss=0.1005, beats_loss=0.009899, ecapa_loss=0.0001405, whisper_loss=0.08919, over 16845.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01043, ecapa_loss=0.0001391, whisper_loss=0.08919, over 3804476.39 frames. ], batch size: 67, lr: 1.85e-03, grad_scale: 1.152921504606847e+18 2024-08-20 14:35:02,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4840780.0, ans=0.2 2024-08-20 14:35:13,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4840880.0, ans=0.125 2024-08-20 14:35:17,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4840880.0, ans=0.125 2024-08-20 14:35:40,930 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.904e+01 2024-08-20 14:36:15,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=4841180.0, ans=0.1 2024-08-20 14:36:25,360 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 9950, loss[loss=0.1121, beats_loss=0.01005, ecapa_loss=0.0001498, whisper_loss=0.1006, over 22151.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01045, ecapa_loss=0.0001391, whisper_loss=0.08908, over 3789763.93 frames. ], batch size: 89, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:36:46,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4841380.0, ans=0.07 2024-08-20 14:36:49,234 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.17 vs. limit=22.5 2024-08-20 14:36:50,916 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 30 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 14:36:59,339 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.238e+01 2.460e+01 2.685e+01 1.158e+02, threshold=4.920e+01, percent-clipped=1.0 2024-08-20 14:37:03,859 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 21 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-20 14:37:23,666 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 26 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 14:37:30,427 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 17 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 14:37:40,564 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 35 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-20 14:37:51,714 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 10000, loss[loss=0.1165, beats_loss=0.008313, ecapa_loss=0.0001314, whisper_loss=0.1069, over 20396.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01046, ecapa_loss=0.0001391, whisper_loss=0.08916, over 3790336.52 frames. ], batch size: 77, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:38:32,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4841980.0, ans=0.125 2024-08-20 14:39:12,900 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 38 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 14:39:19,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4842180.0, ans=0.07 2024-08-20 14:39:21,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4842180.0, ans=0.1 2024-08-20 14:39:33,016 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 10050, loss[loss=0.1059, beats_loss=0.01098, ecapa_loss=0.0001617, whisper_loss=0.09329, over 17476.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01048, ecapa_loss=0.0001394, whisper_loss=0.08942, over 3781090.29 frames. ], batch size: 72, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:39:41,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4842280.0, ans=0.0 2024-08-20 14:39:58,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4842380.0, ans=0.125 2024-08-20 14:40:18,408 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.420e+01 2.635e+01 2.918e+01 2.672e+02, threshold=5.270e+01, percent-clipped=3.0 2024-08-20 14:40:31,219 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 26 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 14:40:38,331 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=15.0 2024-08-20 14:40:45,079 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=12.0 2024-08-20 14:40:47,223 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 25 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-20 14:41:00,364 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2024-08-20 14:41:02,825 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.56 vs. limit=15.0 2024-08-20 14:41:29,583 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 26 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-20 14:41:32,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=4842780.0, ans=15.0 2024-08-20 14:41:33,063 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 10100, loss[loss=0.1048, beats_loss=0.009549, ecapa_loss=0.0001449, whisper_loss=0.09376, over 19905.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01052, ecapa_loss=0.0001386, whisper_loss=0.08936, over 3837959.99 frames. ], batch size: 75, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:41:42,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4842780.0, ans=0.0 2024-08-20 14:41:45,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4842780.0, ans=0.125 2024-08-20 14:41:47,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4842780.0, ans=0.125 2024-08-20 14:42:27,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4842980.0, ans=0.0 2024-08-20 14:42:31,834 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 18 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-20 14:42:47,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4843080.0, ans=10.0 2024-08-20 14:43:03,860 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 23 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-20 14:43:27,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4843280.0, ans=0.0 2024-08-20 14:43:28,199 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 10150, loss[loss=0.1132, beats_loss=0.008486, ecapa_loss=0.0001245, whisper_loss=0.1034, over 20233.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01047, ecapa_loss=0.0001394, whisper_loss=0.08963, over 3849537.43 frames. ], batch size: 75, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:43:52,548 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 15 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 14:44:03,278 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.366e+01 2.588e+01 2.870e+01 1.184e+02, threshold=5.177e+01, percent-clipped=1.0 2024-08-20 14:44:18,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4843480.0, ans=0.125 2024-08-20 14:44:36,357 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=15.0 2024-08-20 14:44:49,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4843680.0, ans=0.2 2024-08-20 14:44:53,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4843680.0, ans=0.2 2024-08-20 14:44:57,148 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 10200, loss[loss=0.09441, beats_loss=0.01251, ecapa_loss=0.0001238, whisper_loss=0.08067, over 23260.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0105, ecapa_loss=0.0001384, whisper_loss=0.08997, over 3857901.61 frames. ], batch size: 95, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:44:57,625 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 17 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 14:45:07,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4843780.0, ans=0.0 2024-08-20 14:45:07,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4843780.0, ans=0.125 2024-08-20 14:45:44,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4843980.0, ans=0.125 2024-08-20 14:45:52,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4844080.0, ans=0.05 2024-08-20 14:46:04,150 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.11 vs. limit=6.0 2024-08-20 14:46:10,113 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 33 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 14:46:17,655 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 21 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 14:46:17,944 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.852e+05 2024-08-20 14:46:27,323 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 10250, loss[loss=0.1039, beats_loss=0.008299, ecapa_loss=0.0001977, whisper_loss=0.09358, over 15786.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01056, ecapa_loss=0.0001387, whisper_loss=0.08943, over 3847777.90 frames. ], batch size: 67, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:46:38,859 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2024-08-20 14:46:44,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4844280.0, ans=0.1 2024-08-20 14:47:03,906 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.666e+01 2.274e+01 2.551e+01 2.893e+01 4.019e+02, threshold=5.101e+01, percent-clipped=3.0 2024-08-20 14:47:16,987 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 23 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 14:47:35,378 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 14 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-20 14:47:52,193 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 19 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-20 14:48:02,605 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 10300, loss[loss=0.09203, beats_loss=0.008894, ecapa_loss=0.0001803, whisper_loss=0.08133, over 13087.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01052, ecapa_loss=0.0001398, whisper_loss=0.08975, over 3801441.71 frames. ], batch size: 55, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:48:08,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4844780.0, ans=0.0 2024-08-20 14:48:10,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4844780.0, ans=0.125 2024-08-20 14:48:18,414 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 14 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-20 14:48:18,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4844780.0, ans=0.125 2024-08-20 14:48:30,984 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 16 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-20 14:48:52,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4844980.0, ans=0.1 2024-08-20 14:49:01,396 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 16 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 14:49:03,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4844980.0, ans=0.1 2024-08-20 14:49:13,587 WARNING [optim.py:496] (0/4) Scaling gradients by 0.02814776450395584, model_norm_threshold=51.010257720947266 2024-08-20 14:49:13,741 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.27, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.746e+05, grad_sumsq=8.746e+05, orig_rms_sq=1.000e+00 2024-08-20 14:49:19,238 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 14:49:28,157 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 26 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-20 14:49:50,239 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 10350, loss[loss=0.08723, beats_loss=0.01315, ecapa_loss=0.0001574, whisper_loss=0.07251, over 22424.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01051, ecapa_loss=0.0001407, whisper_loss=0.08968, over 3789182.21 frames. ], batch size: 95, lr: 1.85e-03, grad_scale: 5.764607523034235e+17 2024-08-20 14:49:51,385 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 14:49:59,649 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 14:50:09,431 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 14:50:09,967 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.18 vs. limit=12.0 2024-08-20 14:50:25,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4845380.0, ans=0.1 2024-08-20 14:50:26,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4845380.0, ans=0.1 2024-08-20 14:50:33,512 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.009e+01 2024-08-20 14:50:37,387 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.346e+01 2.546e+01 2.818e+01 1.812e+03, threshold=5.092e+01, percent-clipped=2.0 2024-08-20 14:50:57,715 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 31 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-20 14:51:12,978 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 13 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-20 14:51:18,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4845580.0, ans=0.125 2024-08-20 14:51:30,634 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 27 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 14:51:41,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4845680.0, ans=0.125 2024-08-20 14:51:50,075 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 23 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 14:51:54,349 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 10400, loss[loss=0.09955, beats_loss=0.01049, ecapa_loss=0.0001457, whisper_loss=0.0876, over 18452.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01049, ecapa_loss=0.00014, whisper_loss=0.09039, over 3827950.80 frames. ], batch size: 74, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 14:51:55,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4845780.0, ans=0.125 2024-08-20 14:52:04,556 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.235e+00 2024-08-20 14:52:15,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4845880.0, ans=0.09899494936611666 2024-08-20 14:53:12,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4846080.0, ans=0.07 2024-08-20 14:53:53,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4846180.0, ans=0.1 2024-08-20 14:53:56,084 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 10450, loss[loss=0.1091, beats_loss=0.00975, ecapa_loss=0.0001687, whisper_loss=0.09768, over 17852.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01045, ecapa_loss=0.0001406, whisper_loss=0.09007, over 3810925.26 frames. ], batch size: 74, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 14:54:08,788 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 26 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-20 14:54:17,896 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 14:54:27,896 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.31 vs. limit=15.0 2024-08-20 14:54:32,018 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 14:54:42,240 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.294e+01 2.565e+01 2.796e+01 8.122e+01, threshold=5.131e+01, percent-clipped=1.0 2024-08-20 14:55:15,322 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 24 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 14:55:30,102 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2024-08-20 14:55:31,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4846680.0, ans=0.125 2024-08-20 14:55:54,033 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 10500, loss[loss=0.1053, beats_loss=0.01066, ecapa_loss=0.0001601, whisper_loss=0.09308, over 22379.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01033, ecapa_loss=0.0001422, whisper_loss=0.09067, over 3845964.44 frames. ], batch size: 91, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 14:55:55,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4846780.0, ans=0.1 2024-08-20 14:55:57,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4846780.0, ans=0.0 2024-08-20 14:56:33,968 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.24 vs. limit=22.5 2024-08-20 14:56:54,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4846980.0, ans=0.0 2024-08-20 14:57:00,941 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 30 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-20 14:57:05,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4847080.0, ans=0.2 2024-08-20 14:57:19,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4847080.0, ans=0.125 2024-08-20 14:57:40,315 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 14:57:42,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4847180.0, ans=0.125 2024-08-20 14:57:50,343 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 10550, loss[loss=0.08014, beats_loss=0.01251, ecapa_loss=9.39e-05, whisper_loss=0.0667, over 18986.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01033, ecapa_loss=0.0001414, whisper_loss=0.09042, over 3811798.43 frames. ], batch size: 74, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 14:57:51,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4847280.0, ans=0.125 2024-08-20 14:57:57,332 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.390e+00 2024-08-20 14:57:57,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4847280.0, ans=0.0 2024-08-20 14:58:02,591 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 35 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-20 14:58:09,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4847280.0, ans=0.0 2024-08-20 14:58:19,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4847380.0, ans=0.125 2024-08-20 14:58:35,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4847480.0, ans=0.1 2024-08-20 14:58:35,817 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.331e+01 2.577e+01 2.959e+01 5.357e+01, threshold=5.154e+01, percent-clipped=1.0 2024-08-20 14:59:08,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4847580.0, ans=0.0 2024-08-20 14:59:23,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4847680.0, ans=0.125 2024-08-20 14:59:30,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4847680.0, ans=0.0 2024-08-20 14:59:45,940 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 10600, loss[loss=0.09607, beats_loss=0.01154, ecapa_loss=0.000134, whisper_loss=0.08319, over 15542.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01038, ecapa_loss=0.0001408, whisper_loss=0.09032, over 3837211.22 frames. ], batch size: 63, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 14:59:54,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4847780.0, ans=0.125 2024-08-20 15:00:36,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4847980.0, ans=0.0 2024-08-20 15:01:12,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4848080.0, ans=0.2 2024-08-20 15:01:22,689 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 15:01:47,021 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 10650, loss[loss=0.1092, beats_loss=0.009478, ecapa_loss=0.000137, whisper_loss=0.09832, over 22891.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01037, ecapa_loss=0.0001397, whisper_loss=0.0894, over 3815823.33 frames. ], batch size: 90, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:01:48,797 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 25 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-20 15:01:54,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4848280.0, ans=0.0 2024-08-20 15:01:57,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4848280.0, ans=0.0 2024-08-20 15:02:09,098 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 16 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-20 15:02:32,584 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 29 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-20 15:02:38,649 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.606e+01 2.292e+01 2.599e+01 2.866e+01 5.790e+01, threshold=5.197e+01, percent-clipped=1.0 2024-08-20 15:02:54,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4848480.0, ans=0.1 2024-08-20 15:02:59,442 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 26 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 15:03:37,404 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 23 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-20 15:03:50,426 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 30 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-20 15:03:54,789 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 10700, loss[loss=0.1226, beats_loss=0.009293, ecapa_loss=0.0001284, whisper_loss=0.1121, over 23708.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01041, ecapa_loss=0.0001405, whisper_loss=0.0894, over 3823187.78 frames. ], batch size: 87, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:03:57,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4848780.0, ans=0.125 2024-08-20 15:04:19,130 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 15:04:26,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4848880.0, ans=0.07 2024-08-20 15:05:03,286 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 25 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-20 15:05:05,746 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 34 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-20 15:05:08,446 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.33 vs. limit=15.0 2024-08-20 15:05:10,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4849080.0, ans=0.0 2024-08-20 15:05:16,375 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.71 vs. limit=6.0 2024-08-20 15:05:18,551 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.36 vs. limit=15.0 2024-08-20 15:06:00,156 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 10750, loss[loss=0.1062, beats_loss=0.01058, ecapa_loss=0.0001286, whisper_loss=0.09438, over 14675.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01049, ecapa_loss=0.0001396, whisper_loss=0.08975, over 3847441.76 frames. ], batch size: 54, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:06:16,640 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 25 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-20 15:06:16,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4849280.0, ans=0.0 2024-08-20 15:06:33,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4849380.0, ans=0.125 2024-08-20 15:06:45,857 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 14 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-20 15:06:49,431 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.343e+01 2.598e+01 3.019e+01 8.630e+01, threshold=5.195e+01, percent-clipped=2.0 2024-08-20 15:06:57,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4849480.0, ans=0.0 2024-08-20 15:07:03,237 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 13 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-20 15:07:18,827 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-20 15:07:24,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4849580.0, ans=0.05 2024-08-20 15:07:51,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4849680.0, ans=10.0 2024-08-20 15:07:54,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4849680.0, ans=0.125 2024-08-20 15:07:57,578 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 10800, loss[loss=0.1044, beats_loss=0.01084, ecapa_loss=0.0001243, whisper_loss=0.0923, over 23834.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01049, ecapa_loss=0.0001402, whisper_loss=0.08951, over 3825491.72 frames. ], batch size: 94, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:08:16,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4849780.0, ans=0.0 2024-08-20 15:08:33,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4849880.0, ans=0.125 2024-08-20 15:08:37,427 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 24 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-20 15:09:02,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4849980.0, ans=0.125 2024-08-20 15:09:20,901 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 26 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 15:09:30,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4850180.0, ans=0.0 2024-08-20 15:09:39,823 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 23 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 15:09:47,855 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-20 15:09:53,687 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 10850, loss[loss=0.09099, beats_loss=0.00685, ecapa_loss=0.0001815, whisper_loss=0.08233, over 14171.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01036, ecapa_loss=0.0001397, whisper_loss=0.08991, over 3789782.64 frames. ], batch size: 54, lr: 1.85e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:10:00,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4850280.0, ans=0.125 2024-08-20 15:10:02,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4850280.0, ans=0.125 2024-08-20 15:10:04,894 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 33 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-20 15:10:05,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4850280.0, ans=0.1 2024-08-20 15:10:07,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4850280.0, ans=0.125 2024-08-20 15:10:24,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4850380.0, ans=0.125 2024-08-20 15:10:30,248 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.03 vs. limit=22.5 2024-08-20 15:10:42,647 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.250e+01 2.553e+01 2.783e+01 2.694e+02, threshold=5.105e+01, percent-clipped=1.0 2024-08-20 15:10:49,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4850480.0, ans=0.0 2024-08-20 15:11:00,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4850480.0, ans=0.2 2024-08-20 15:11:37,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4850680.0, ans=0.025 2024-08-20 15:11:44,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4850680.0, ans=0.2 2024-08-20 15:11:47,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4850680.0, ans=0.125 2024-08-20 15:11:57,395 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 10900, loss[loss=0.1025, beats_loss=0.01059, ecapa_loss=0.00016, whisper_loss=0.09034, over 21635.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01029, ecapa_loss=0.0001406, whisper_loss=0.09092, over 3812764.65 frames. ], batch size: 91, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:12:37,806 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2024-08-20 15:12:44,691 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2024-08-20 15:12:47,050 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=9.564e+00 2024-08-20 15:12:55,463 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 30 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-20 15:12:58,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4850980.0, ans=0.2 2024-08-20 15:13:09,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4851080.0, ans=0.125 2024-08-20 15:13:09,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4851080.0, ans=0.125 2024-08-20 15:13:53,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4851280.0, ans=0.0 2024-08-20 15:13:54,351 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 10950, loss[loss=0.08759, beats_loss=0.01106, ecapa_loss=0.0001582, whisper_loss=0.07495, over 18819.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01039, ecapa_loss=0.0001405, whisper_loss=0.09056, over 3814926.08 frames. ], batch size: 80, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:14:21,760 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 15:14:23,150 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 16 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-20 15:14:40,428 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.141e+01 2.427e+01 2.900e+01 4.434e+01, threshold=4.855e+01, percent-clipped=0.0 2024-08-20 15:14:56,802 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 26 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-20 15:15:04,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4851580.0, ans=0.0 2024-08-20 15:15:31,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4851680.0, ans=10.0 2024-08-20 15:15:38,708 WARNING [optim.py:496] (0/4) Scaling gradients by 0.06528465449810028, model_norm_threshold=48.54976272583008 2024-08-20 15:15:38,905 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.33, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.799e+05, grad_sumsq=1.799e+05, orig_rms_sq=1.000e+00 2024-08-20 15:15:47,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4851680.0, ans=0.0 2024-08-20 15:15:53,492 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 11000, loss[loss=0.08163, beats_loss=0.01391, ecapa_loss=0.0001412, whisper_loss=0.06631, over 18574.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01039, ecapa_loss=0.00014, whisper_loss=0.09139, over 3832566.21 frames. ], batch size: 79, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:16:00,615 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 24 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-20 15:16:03,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4851780.0, ans=0.0 2024-08-20 15:16:12,311 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 18 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-20 15:16:23,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4851880.0, ans=0.1 2024-08-20 15:16:45,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4851980.0, ans=0.125 2024-08-20 15:17:47,118 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 11050, loss[loss=0.1068, beats_loss=0.01195, ecapa_loss=0.0001235, whisper_loss=0.09357, over 18613.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01045, ecapa_loss=0.0001402, whisper_loss=0.09056, over 3813944.49 frames. ], batch size: 73, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:18:24,448 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 25 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-20 15:18:24,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4852380.0, ans=0.125 2024-08-20 15:18:35,032 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.285e+01 2.516e+01 2.757e+01 7.437e+02, threshold=5.033e+01, percent-clipped=2.0 2024-08-20 15:18:55,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4852580.0, ans=0.1 2024-08-20 15:19:27,444 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2024-08-20 15:19:46,485 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 11100, loss[loss=0.08267, beats_loss=0.01072, ecapa_loss=0.0001491, whisper_loss=0.07046, over 21533.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01048, ecapa_loss=0.00014, whisper_loss=0.09013, over 3823317.98 frames. ], batch size: 87, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:20:34,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4852980.0, ans=0.2 2024-08-20 15:20:41,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4852980.0, ans=0.0 2024-08-20 15:20:44,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4852980.0, ans=0.125 2024-08-20 15:21:00,239 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.66 vs. limit=15.0 2024-08-20 15:21:36,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4853180.0, ans=0.125 2024-08-20 15:21:44,381 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 11150, loss[loss=0.1179, beats_loss=0.0101, ecapa_loss=0.000167, whisper_loss=0.1061, over 21928.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.00014, whisper_loss=0.0903, over 3828306.81 frames. ], batch size: 90, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:21:51,651 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-20 15:22:22,894 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 10 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-20 15:22:25,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4853380.0, ans=0.125 2024-08-20 15:22:30,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4853480.0, ans=0.0 2024-08-20 15:22:30,934 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.306e+01 2.527e+01 2.770e+01 3.887e+01, threshold=5.054e+01, percent-clipped=0.0 2024-08-20 15:23:06,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4853580.0, ans=0.0 2024-08-20 15:23:09,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4853580.0, ans=0.1 2024-08-20 15:23:21,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4853680.0, ans=0.0 2024-08-20 15:23:29,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4853680.0, ans=0.0 2024-08-20 15:23:43,632 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.06 vs. limit=12.0 2024-08-20 15:23:46,555 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 11200, loss[loss=0.09225, beats_loss=0.01118, ecapa_loss=0.0001326, whisper_loss=0.07974, over 13636.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01042, ecapa_loss=0.0001396, whisper_loss=0.09018, over 3823691.41 frames. ], batch size: 53, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:23:49,233 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 31 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-20 15:23:54,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4853780.0, ans=0.1 2024-08-20 15:24:18,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4853880.0, ans=0.0 2024-08-20 15:24:32,536 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.42 vs. limit=15.0 2024-08-20 15:24:42,823 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 24 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-20 15:24:50,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4853980.0, ans=0.0 2024-08-20 15:24:50,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4853980.0, ans=0.0 2024-08-20 15:24:58,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4853980.0, ans=0.1 2024-08-20 15:25:23,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4854080.0, ans=0.125 2024-08-20 15:25:39,334 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 22 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-20 15:25:44,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4854180.0, ans=0.1 2024-08-20 15:25:56,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4854280.0, ans=0.0 2024-08-20 15:25:56,979 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 11250, loss[loss=0.1119, beats_loss=0.01065, ecapa_loss=0.0001276, whisper_loss=0.1, over 22777.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.0001381, whisper_loss=0.09041, over 3867965.33 frames. ], batch size: 87, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:26:29,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4854380.0, ans=0.1 2024-08-20 15:26:45,126 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.343e+01 2.555e+01 2.874e+01 4.205e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-20 15:27:10,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4854580.0, ans=0.2 2024-08-20 15:27:20,035 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 24 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-20 15:27:20,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4854580.0, ans=0.09899494936611666 2024-08-20 15:27:30,746 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.25 vs. limit=6.0 2024-08-20 15:27:32,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4854680.0, ans=0.0 2024-08-20 15:27:47,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4854680.0, ans=0.125 2024-08-20 15:27:49,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4854680.0, ans=10.0 2024-08-20 15:27:57,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4854780.0, ans=0.0 2024-08-20 15:27:58,057 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 11300, loss[loss=0.1171, beats_loss=0.008429, ecapa_loss=0.000138, whisper_loss=0.1073, over 19236.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0104, ecapa_loss=0.0001386, whisper_loss=0.09054, over 3857490.57 frames. ], batch size: 75, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:28:04,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4854780.0, ans=0.125 2024-08-20 15:28:12,034 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 21 from LS+wenet, 33 from Vox, 36 fro AS 2024-08-20 15:28:26,553 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 18 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-20 15:28:34,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4854880.0, ans=0.0 2024-08-20 15:28:46,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4854980.0, ans=0.125 2024-08-20 15:28:55,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4854980.0, ans=0.125 2024-08-20 15:29:40,681 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 22 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-20 15:29:44,050 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2024-08-20 15:29:59,447 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 11350, loss[loss=0.1095, beats_loss=0.01111, ecapa_loss=0.0001359, whisper_loss=0.09702, over 21713.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001384, whisper_loss=0.0903, over 3855393.73 frames. ], batch size: 88, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:30:15,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4855280.0, ans=0.0 2024-08-20 15:30:18,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4855280.0, ans=0.2 2024-08-20 15:30:25,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4855380.0, ans=0.125 2024-08-20 15:30:28,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4855380.0, ans=0.125 2024-08-20 15:30:38,112 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 19 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 15:30:49,352 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+01 2.328e+01 2.511e+01 2.770e+01 2.674e+02, threshold=5.022e+01, percent-clipped=3.0 2024-08-20 15:31:05,960 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 25 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-20 15:31:18,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4855580.0, ans=0.0 2024-08-20 15:31:45,560 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.55 vs. limit=15.0 2024-08-20 15:32:03,240 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 11400, loss[loss=0.1036, beats_loss=0.008837, ecapa_loss=0.0001621, whisper_loss=0.09315, over 15764.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01031, ecapa_loss=0.0001379, whisper_loss=0.09032, over 3818108.99 frames. ], batch size: 62, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:32:30,972 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 15:32:58,459 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=15.0 2024-08-20 15:33:44,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4856180.0, ans=0.125 2024-08-20 15:34:02,584 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 11450, loss[loss=0.09622, beats_loss=0.01136, ecapa_loss=0.000167, whisper_loss=0.08319, over 13424.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01039, ecapa_loss=0.0001386, whisper_loss=0.08982, over 3830835.39 frames. ], batch size: 55, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:34:30,081 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.70 vs. limit=15.0 2024-08-20 15:34:53,075 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.356e+01 2.634e+01 3.043e+01 3.885e+01, threshold=5.268e+01, percent-clipped=0.0 2024-08-20 15:35:03,357 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 17 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-20 15:35:03,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4856480.0, ans=0.125 2024-08-20 15:35:09,018 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 15:35:26,964 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 25 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-20 15:36:01,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4856780.0, ans=0.0 2024-08-20 15:36:02,524 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 11500, loss[loss=0.1086, beats_loss=0.009638, ecapa_loss=0.0001065, whisper_loss=0.09787, over 14971.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01027, ecapa_loss=0.0001387, whisper_loss=0.09067, over 3836311.86 frames. ], batch size: 54, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:36:27,268 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 28 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-20 15:36:35,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4856880.0, ans=0.125 2024-08-20 15:36:37,199 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 26 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-20 15:37:06,993 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 24 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-20 15:37:09,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4857080.0, ans=0.1 2024-08-20 15:37:23,161 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.42 vs. limit=15.0 2024-08-20 15:37:26,809 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-20 15:37:26,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4857080.0, ans=0.0 2024-08-20 15:37:31,803 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.833e-01 2024-08-20 15:37:31,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4857180.0, ans=0.04949747468305833 2024-08-20 15:37:53,876 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 11550, loss[loss=0.08075, beats_loss=0.0112, ecapa_loss=0.000139, whisper_loss=0.06816, over 17225.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01037, ecapa_loss=0.0001383, whisper_loss=0.09005, over 3818208.06 frames. ], batch size: 67, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:37:59,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4857280.0, ans=0.025 2024-08-20 15:38:08,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4857280.0, ans=0.0 2024-08-20 15:38:09,211 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.08 vs. limit=15.0 2024-08-20 15:38:30,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4857380.0, ans=0.0 2024-08-20 15:38:40,529 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.202e+01 2.508e+01 2.840e+01 4.143e+01, threshold=5.016e+01, percent-clipped=0.0 2024-08-20 15:38:46,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4857480.0, ans=0.1 2024-08-20 15:39:08,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4857580.0, ans=0.125 2024-08-20 15:39:19,918 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.83 vs. limit=6.0 2024-08-20 15:39:25,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4857680.0, ans=0.0 2024-08-20 15:39:44,029 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 27 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-20 15:39:46,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4857780.0, ans=0.1 2024-08-20 15:39:47,012 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 11600, loss[loss=0.1159, beats_loss=0.009974, ecapa_loss=0.0001498, whisper_loss=0.1045, over 22607.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0104, ecapa_loss=0.0001372, whisper_loss=0.09067, over 3821363.31 frames. ], batch size: 93, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:39:50,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4857780.0, ans=0.125 2024-08-20 15:39:52,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4857780.0, ans=0.125 2024-08-20 15:40:23,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4857880.0, ans=0.125 2024-08-20 15:41:07,904 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 20 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 15:41:12,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4858080.0, ans=0.1 2024-08-20 15:41:36,354 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 11650, loss[loss=0.08531, beats_loss=0.01533, ecapa_loss=0.0001229, whisper_loss=0.06875, over 21796.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01032, ecapa_loss=0.0001388, whisper_loss=0.09071, over 3827047.23 frames. ], batch size: 93, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:42:14,472 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2024-08-20 15:42:18,931 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 37 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 15:42:24,939 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.221e+01 2.529e+01 2.912e+01 8.219e+01, threshold=5.058e+01, percent-clipped=1.0 2024-08-20 15:43:14,568 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 20 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-20 15:43:15,173 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.83 vs. limit=22.5 2024-08-20 15:43:34,408 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 11700, loss[loss=0.1, beats_loss=0.01217, ecapa_loss=0.0001311, whisper_loss=0.08652, over 22108.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01034, ecapa_loss=0.0001396, whisper_loss=0.09027, over 3832081.39 frames. ], batch size: 92, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:44:08,739 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 22 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-20 15:44:18,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4858880.0, ans=0.125 2024-08-20 15:44:34,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4858980.0, ans=0.125 2024-08-20 15:44:49,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4859080.0, ans=0.0 2024-08-20 15:45:11,479 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 25 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-20 15:45:27,137 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.48 vs. limit=10.0 2024-08-20 15:45:27,533 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 11750, loss[loss=0.07709, beats_loss=0.01216, ecapa_loss=0.000158, whisper_loss=0.06335, over 20635.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01045, ecapa_loss=0.0001377, whisper_loss=0.08933, over 3819235.53 frames. ], batch size: 91, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:45:35,815 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 19 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 15:45:42,792 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 17 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-20 15:46:07,878 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-20 15:46:10,690 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.330e+01 2.512e+01 2.808e+01 3.989e+01, threshold=5.024e+01, percent-clipped=0.0 2024-08-20 15:46:18,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4859480.0, ans=0.025 2024-08-20 15:46:28,618 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 26 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-20 15:46:30,090 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 29 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-20 15:46:36,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4859580.0, ans=0.2 2024-08-20 15:46:40,797 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 12 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-20 15:46:45,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4859580.0, ans=0.125 2024-08-20 15:46:50,061 WARNING [optim.py:496] (0/4) Scaling gradients by 0.03734064847230911, model_norm_threshold=50.2408561706543 2024-08-20 15:46:50,233 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.797e+05, grad_sumsq=2.797e+05, orig_rms_sq=1.000e+00 2024-08-20 15:47:14,977 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 11800, loss[loss=0.09007, beats_loss=0.01098, ecapa_loss=0.0001742, whisper_loss=0.07734, over 14191.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01039, ecapa_loss=0.0001391, whisper_loss=0.08993, over 3811148.24 frames. ], batch size: 61, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:47:16,640 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.90 vs. limit=15.0 2024-08-20 15:48:06,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4859980.0, ans=0.125 2024-08-20 15:48:07,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4859980.0, ans=0.0 2024-08-20 15:48:57,040 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 21 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-20 15:48:58,015 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 11850, loss[loss=0.08596, beats_loss=0.01255, ecapa_loss=0.0001496, whisper_loss=0.07192, over 21406.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01043, ecapa_loss=0.0001388, whisper_loss=0.0895, over 3842764.34 frames. ], batch size: 92, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:49:18,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4860380.0, ans=0.1 2024-08-20 15:49:36,768 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.601e+01 2.263e+01 2.480e+01 2.849e+01 1.345e+03, threshold=4.961e+01, percent-clipped=1.0 2024-08-20 15:49:51,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4860480.0, ans=0.2 2024-08-20 15:49:56,135 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 14 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-20 15:50:00,330 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 24 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 15:50:00,731 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.48 vs. limit=22.5 2024-08-20 15:50:34,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4860680.0, ans=0.95 2024-08-20 15:50:36,112 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 26 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-20 15:50:38,970 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 11900, loss[loss=0.1048, beats_loss=0.01127, ecapa_loss=0.0001374, whisper_loss=0.09218, over 22240.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01038, ecapa_loss=0.0001394, whisper_loss=0.08994, over 3866780.71 frames. ], batch size: 93, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:50:43,330 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.71 vs. limit=15.0 2024-08-20 15:50:46,381 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-20 15:50:48,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4860780.0, ans=0.05 2024-08-20 15:50:56,584 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 17 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 15:51:07,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4860880.0, ans=0.2 2024-08-20 15:51:23,739 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 26 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 15:51:30,423 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 27 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-20 15:51:49,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4861080.0, ans=0.2 2024-08-20 15:52:06,416 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 15:52:17,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4861180.0, ans=0.125 2024-08-20 15:52:22,949 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 11950, loss[loss=0.1338, beats_loss=0.007978, ecapa_loss=0.0001379, whisper_loss=0.1244, over 23253.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01033, ecapa_loss=0.0001399, whisper_loss=0.09113, over 3884716.29 frames. ], batch size: 87, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:52:24,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4861280.0, ans=0.125 2024-08-20 15:52:26,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4861280.0, ans=0.125 2024-08-20 15:52:41,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4861280.0, ans=0.125 2024-08-20 15:52:43,294 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 15:53:03,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4861380.0, ans=0.2 2024-08-20 15:53:06,034 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.535e+01 2.342e+01 2.523e+01 2.820e+01 2.544e+02, threshold=5.046e+01, percent-clipped=1.0 2024-08-20 15:53:10,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4861480.0, ans=0.125 2024-08-20 15:53:12,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4861480.0, ans=0.0 2024-08-20 15:53:51,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4861680.0, ans=0.1 2024-08-20 15:54:01,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4861680.0, ans=0.0 2024-08-20 15:54:13,284 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 12000, loss[loss=0.0824, beats_loss=0.01096, ecapa_loss=0.0001263, whisper_loss=0.07018, over 20773.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01041, ecapa_loss=0.0001387, whisper_loss=0.09092, over 3894326.43 frames. ], batch size: 83, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:54:13,285 INFO [train_multi_KD3.py:1140] (0/4) Computing validation loss 2024-08-20 15:54:48,572 INFO [train_multi_KD3.py:1150] (0/4) Epoch 33, validation on ASR_libri: loss=0.2555, beats_loss=0, ecapa_loss=0.000501, whisper_loss=0.2505, over 931116.00 frames. 2024-08-20 15:55:14,173 INFO [train_multi_KD3.py:1150] (0/4) Epoch 33, validation on SV_voxceleb1: loss=0.003892, beats_loss=0, ecapa_loss=0.0003892, whisper_loss=0, over 944235.00 frames. 2024-08-20 15:56:55,468 INFO [train_multi_KD3.py:1150] (0/4) Epoch 33, validation on AT_audioset: loss=0.02299, beats_loss=0.02299, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 15:56:55,472 INFO [train_multi_KD3.py:1156] (0/4) Maximum memory allocated so far is 32775MB 2024-08-20 15:57:07,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4861780.0, ans=0.2 2024-08-20 15:57:09,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4861780.0, ans=0.125 2024-08-20 15:57:11,283 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.55 vs. limit=15.0 2024-08-20 15:57:19,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4861880.0, ans=0.0 2024-08-20 15:57:24,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4861880.0, ans=0.2 2024-08-20 15:57:32,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4861980.0, ans=0.0 2024-08-20 15:57:37,266 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 17 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-20 15:57:40,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4861980.0, ans=0.125 2024-08-20 15:57:46,053 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 30 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-20 15:58:18,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4862280.0, ans=0.1 2024-08-20 15:58:19,397 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 12050, loss[loss=0.09685, beats_loss=0.01152, ecapa_loss=0.0001407, whisper_loss=0.08393, over 20633.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0104, ecapa_loss=0.0001385, whisper_loss=0.09089, over 3867981.07 frames. ], batch size: 86, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:58:28,781 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.41 vs. limit=22.5 2024-08-20 15:58:40,698 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 30 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-20 15:58:44,338 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.65 vs. limit=22.5 2024-08-20 15:58:47,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4862380.0, ans=0.125 2024-08-20 15:58:53,548 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.378e+01 2.665e+01 2.948e+01 5.073e+01, threshold=5.329e+01, percent-clipped=1.0 2024-08-20 15:58:53,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4862480.0, ans=0.125 2024-08-20 15:58:55,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4862480.0, ans=0.0 2024-08-20 15:59:09,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4862580.0, ans=0.1 2024-08-20 15:59:33,121 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-20 15:59:36,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4862680.0, ans=0.0 2024-08-20 15:59:44,455 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 12100, loss[loss=0.08193, beats_loss=0.01213, ecapa_loss=0.0001374, whisper_loss=0.06842, over 20957.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01039, ecapa_loss=0.0001391, whisper_loss=0.0912, over 3833791.77 frames. ], batch size: 86, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 15:59:48,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4862780.0, ans=0.0 2024-08-20 15:59:48,206 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 15:59:49,920 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 19 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-20 16:00:00,411 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 20 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-20 16:00:06,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4862880.0, ans=0.125 2024-08-20 16:00:12,023 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 16:00:13,600 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 16 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 16:00:40,523 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2024-08-20 16:00:42,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4863080.0, ans=0.125 2024-08-20 16:00:51,728 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 20 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-20 16:00:59,521 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 21 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-20 16:01:07,085 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 12150, loss[loss=0.1284, beats_loss=0.0111, ecapa_loss=0.000116, whisper_loss=0.1161, over 23959.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.0001394, whisper_loss=0.09033, over 3835072.19 frames. ], batch size: 89, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 16:01:12,550 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 26 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 16:01:39,694 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.292e+01 2.549e+01 2.868e+01 6.331e+01, threshold=5.097e+01, percent-clipped=1.0 2024-08-20 16:02:04,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4863580.0, ans=0.1 2024-08-20 16:02:25,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4863680.0, ans=0.2 2024-08-20 16:02:28,286 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 12200, loss[loss=0.1015, beats_loss=0.01157, ecapa_loss=0.0001348, whisper_loss=0.08862, over 16940.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01033, ecapa_loss=0.000141, whisper_loss=0.09027, over 3773784.85 frames. ], batch size: 66, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 16:02:30,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4863780.0, ans=0.0 2024-08-20 16:02:42,208 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.47 vs. limit=15.0 2024-08-20 16:02:48,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4863880.0, ans=0.125 2024-08-20 16:03:06,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4863980.0, ans=0.07 2024-08-20 16:03:12,259 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 21 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 16:03:36,861 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 33 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-20 16:03:39,890 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 30 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-20 16:03:40,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4864180.0, ans=0.0 2024-08-20 16:03:41,609 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 17 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 16:03:48,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4864280.0, ans=0.125 2024-08-20 16:03:49,864 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 12250, loss[loss=0.1219, beats_loss=0.008333, ecapa_loss=0.0001196, whisper_loss=0.1124, over 20323.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01034, ecapa_loss=0.0001405, whisper_loss=0.09028, over 3741958.13 frames. ], batch size: 73, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 16:03:53,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4864280.0, ans=0.125 2024-08-20 16:04:06,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4864380.0, ans=0.125 2024-08-20 16:04:21,819 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.269e+01 2.404e+01 2.750e+01 9.360e+01, threshold=4.808e+01, percent-clipped=1.0 2024-08-20 16:05:01,315 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.058e+05 2024-08-20 16:05:06,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4864680.0, ans=0.125 2024-08-20 16:05:11,856 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 12300, loss[loss=0.1079, beats_loss=0.01095, ecapa_loss=0.0001367, whisper_loss=0.09556, over 20412.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01039, ecapa_loss=0.0001408, whisper_loss=0.09004, over 3768674.91 frames. ], batch size: 82, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 16:05:13,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4864780.0, ans=0.125 2024-08-20 16:05:36,379 WARNING [optim.py:496] (0/4) Scaling gradients by 0.05332305282354355, model_norm_threshold=48.08091354370117 2024-08-20 16:05:36,548 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.333e+05, grad_sumsq=1.333e+05, orig_rms_sq=1.000e+00 2024-08-20 16:05:38,172 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 26 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 16:05:44,762 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 15 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-20 16:06:02,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4865080.0, ans=0.125 2024-08-20 16:06:04,034 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 17 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-20 16:06:30,870 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 25 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-20 16:06:34,546 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 12350, loss[loss=0.08861, beats_loss=0.01153, ecapa_loss=0.00011, whisper_loss=0.07599, over 23277.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01047, ecapa_loss=0.0001405, whisper_loss=0.08956, over 3773629.21 frames. ], batch size: 92, lr: 1.84e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 16:06:52,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4865380.0, ans=0.125 2024-08-20 16:07:06,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4865380.0, ans=0.0 2024-08-20 16:07:08,945 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.318e+01 2.528e+01 2.855e+01 9.017e+02, threshold=5.055e+01, percent-clipped=1.0 2024-08-20 16:07:13,741 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-20 16:07:15,567 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 16 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-20 16:07:19,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4865480.0, ans=0.1 2024-08-20 16:07:21,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4865480.0, ans=0.2 2024-08-20 16:07:26,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4865580.0, ans=0.0 2024-08-20 16:07:43,081 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 27 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-20 16:07:47,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4865680.0, ans=0.125 2024-08-20 16:07:52,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4865680.0, ans=0.125 2024-08-20 16:07:59,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4865780.0, ans=0.125 2024-08-20 16:08:00,598 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 12400, loss[loss=0.09251, beats_loss=0.01168, ecapa_loss=9.837e-05, whisper_loss=0.07985, over 18346.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01049, ecapa_loss=0.0001392, whisper_loss=0.08938, over 3787057.04 frames. ], batch size: 69, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:08:14,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4865780.0, ans=0.0 2024-08-20 16:08:20,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4865880.0, ans=0.125 2024-08-20 16:08:22,005 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 16:08:24,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4865880.0, ans=0.1 2024-08-20 16:08:32,954 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 29 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 16:08:33,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4865880.0, ans=0.125 2024-08-20 16:08:36,557 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 20 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-20 16:08:45,065 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 16:09:16,503 INFO [train_multi_KD3.py:845] (0/4) A total of 49 cuts. 12 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-20 16:09:39,953 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 12450, loss[loss=0.1033, beats_loss=0.009513, ecapa_loss=0.000136, whisper_loss=0.09245, over 19244.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01036, ecapa_loss=0.0001399, whisper_loss=0.09019, over 3794341.30 frames. ], batch size: 78, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:09:52,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4866280.0, ans=0.1 2024-08-20 16:10:22,394 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.270e+01 2.513e+01 2.843e+01 4.408e+01, threshold=5.027e+01, percent-clipped=0.0 2024-08-20 16:10:31,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4866480.0, ans=0.1 2024-08-20 16:10:35,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4866480.0, ans=0.1 2024-08-20 16:10:45,456 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 9 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-20 16:10:58,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4866580.0, ans=0.1 2024-08-20 16:11:23,880 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 12500, loss[loss=0.08531, beats_loss=0.01051, ecapa_loss=0.0001345, whisper_loss=0.07346, over 18537.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01034, ecapa_loss=0.0001401, whisper_loss=0.09066, over 3788755.33 frames. ], batch size: 75, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:11:30,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4866780.0, ans=0.0 2024-08-20 16:11:33,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4866780.0, ans=0.1 2024-08-20 16:11:33,186 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=12.0 2024-08-20 16:11:43,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4866880.0, ans=0.05 2024-08-20 16:11:45,960 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 27 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-20 16:11:52,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4866880.0, ans=0.0 2024-08-20 16:12:02,233 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-20 16:12:16,349 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.33 vs. limit=15.0 2024-08-20 16:12:54,060 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 20 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 16:13:10,392 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 29 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-20 16:13:15,595 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 12550, loss[loss=0.08655, beats_loss=0.01046, ecapa_loss=0.0001588, whisper_loss=0.0745, over 21521.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01031, ecapa_loss=0.0001405, whisper_loss=0.09074, over 3804877.88 frames. ], batch size: 91, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:13:18,225 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.21 vs. limit=10.0 2024-08-20 16:13:40,917 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 12 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 16:13:45,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4867380.0, ans=0.125 2024-08-20 16:13:50,639 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.52 vs. limit=15.0 2024-08-20 16:13:55,623 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.10 vs. limit=15.0 2024-08-20 16:13:59,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4867380.0, ans=0.125 2024-08-20 16:14:02,224 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2024-08-20 16:14:02,669 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.459e+01 2.718e+01 3.101e+01 5.496e+01, threshold=5.435e+01, percent-clipped=1.0 2024-08-20 16:14:04,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4867480.0, ans=0.125 2024-08-20 16:14:30,273 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 19 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-20 16:14:45,329 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2024-08-20 16:15:13,166 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 12600, loss[loss=0.06743, beats_loss=0.01493, ecapa_loss=0.0001152, whisper_loss=0.05134, over 14812.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01035, ecapa_loss=0.0001405, whisper_loss=0.0904, over 3825295.34 frames. ], batch size: 60, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:15:47,943 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 27 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-20 16:16:08,083 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 25 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 16:16:08,537 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.07 vs. limit=15.0 2024-08-20 16:16:10,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4867980.0, ans=0.125 2024-08-20 16:16:21,776 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 21 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 16:16:24,011 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 22 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-20 16:16:26,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4868080.0, ans=0.1 2024-08-20 16:16:28,814 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 15 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 16:16:35,197 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 22 from LS+wenet, 14 from Vox, 14 fro AS 2024-08-20 16:16:43,861 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 20 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-20 16:17:03,147 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.29 vs. limit=12.0 2024-08-20 16:17:05,907 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 12650, loss[loss=0.1166, beats_loss=0.009203, ecapa_loss=0.0001397, whisper_loss=0.106, over 19529.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01033, ecapa_loss=0.0001399, whisper_loss=0.09078, over 3808086.95 frames. ], batch size: 77, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:17:16,459 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 11 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-20 16:17:51,142 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.312e+01 2.541e+01 2.719e+01 3.789e+01, threshold=5.083e+01, percent-clipped=0.0 2024-08-20 16:17:57,672 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 34 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-20 16:18:21,660 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 24 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 16:18:47,348 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 33 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-20 16:18:49,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4868680.0, ans=0.2 2024-08-20 16:18:50,662 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 26 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-20 16:18:55,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4868680.0, ans=0.0 2024-08-20 16:18:58,745 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 12700, loss[loss=0.09068, beats_loss=0.01133, ecapa_loss=0.0001345, whisper_loss=0.078, over 23715.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01038, ecapa_loss=0.0001395, whisper_loss=0.09045, over 3828807.22 frames. ], batch size: 96, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:19:00,571 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 25 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-20 16:19:13,540 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 29 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 16:19:28,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4868880.0, ans=0.0 2024-08-20 16:19:32,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4868880.0, ans=0.2 2024-08-20 16:19:49,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4868980.0, ans=0.125 2024-08-20 16:20:48,279 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 18 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-20 16:20:51,321 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 12750, loss[loss=0.1118, beats_loss=0.008813, ecapa_loss=0.0001195, whisper_loss=0.1018, over 13689.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01026, ecapa_loss=0.0001398, whisper_loss=0.09104, over 3800564.89 frames. ], batch size: 51, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:21:05,361 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 22 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-20 16:21:12,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4869380.0, ans=0.125 2024-08-20 16:21:27,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4869380.0, ans=0.2 2024-08-20 16:21:35,114 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.927e+01 2.357e+01 2.635e+01 3.039e+01 5.268e+01, threshold=5.270e+01, percent-clipped=2.0 2024-08-20 16:21:41,652 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 23 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 16:21:58,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4869580.0, ans=0.125 2024-08-20 16:22:01,982 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 26 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-20 16:22:04,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=4869580.0, ans=0.05 2024-08-20 16:22:09,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4869580.0, ans=0.0 2024-08-20 16:22:34,024 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-08-20 16:22:36,639 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 12800, loss[loss=0.1009, beats_loss=0.008419, ecapa_loss=0.0001672, whisper_loss=0.09083, over 22240.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01024, ecapa_loss=0.0001404, whisper_loss=0.09055, over 3766268.25 frames. ], batch size: 92, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:22:38,959 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.11 vs. limit=10.0 2024-08-20 16:22:49,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4869780.0, ans=0.125 2024-08-20 16:23:26,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4869980.0, ans=0.2 2024-08-20 16:23:55,535 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.81 vs. limit=15.0 2024-08-20 16:24:09,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4870180.0, ans=0.125 2024-08-20 16:24:26,342 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 12850, loss[loss=0.08687, beats_loss=0.01274, ecapa_loss=0.0001179, whisper_loss=0.07296, over 21975.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01026, ecapa_loss=0.0001403, whisper_loss=0.09042, over 3782384.45 frames. ], batch size: 88, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:24:51,877 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 16:25:11,325 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.389e+01 2.612e+01 2.924e+01 4.831e+01, threshold=5.224e+01, percent-clipped=0.0 2024-08-20 16:25:34,356 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 17 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 16:25:36,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=4870580.0, ans=10.0 2024-08-20 16:25:44,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4870580.0, ans=0.125 2024-08-20 16:25:46,696 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 24 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-20 16:26:12,663 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 12900, loss[loss=0.0948, beats_loss=0.01081, ecapa_loss=0.0001685, whisper_loss=0.0823, over 22116.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01033, ecapa_loss=0.0001411, whisper_loss=0.08986, over 3768240.72 frames. ], batch size: 91, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:26:35,615 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 16:26:46,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4870880.0, ans=0.1 2024-08-20 16:26:50,309 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 16:27:00,628 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 20 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-20 16:27:00,864 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.640e-01 2024-08-20 16:27:36,809 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-20 16:27:58,873 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 12950, loss[loss=0.1108, beats_loss=0.009896, ecapa_loss=0.0001281, whisper_loss=0.0996, over 22485.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01043, ecapa_loss=0.0001399, whisper_loss=0.08986, over 3790197.24 frames. ], batch size: 89, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:28:16,897 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 25 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-20 16:28:33,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4871380.0, ans=0.015 2024-08-20 16:28:40,004 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.303e+01 2.529e+01 2.820e+01 1.360e+02, threshold=5.058e+01, percent-clipped=1.0 2024-08-20 16:28:45,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4871480.0, ans=0.0 2024-08-20 16:28:48,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4871480.0, ans=0.95 2024-08-20 16:29:29,686 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.94 vs. limit=15.0 2024-08-20 16:29:40,538 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08873618394136429, model_norm_threshold=50.58396911621094 2024-08-20 16:29:40,710 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.0.self_attn_weights.in_proj.bias with proportion 0.22, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.213e+04, grad_sumsq=8.006e+03, orig_rms_sq=9.010e+00 2024-08-20 16:29:47,550 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 13000, loss[loss=0.1131, beats_loss=0.01063, ecapa_loss=0.0001163, whisper_loss=0.1013, over 16642.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01051, ecapa_loss=0.0001406, whisper_loss=0.08975, over 3767808.36 frames. ], batch size: 66, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:30:12,986 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 16:30:33,834 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08446179330348969, model_norm_threshold=50.58396911621094 2024-08-20 16:30:34,004 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.496e+04, grad_sumsq=4.191e+06, orig_rms_sq=1.073e-02 2024-08-20 16:30:39,760 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 30 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-20 16:31:03,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4872080.0, ans=0.125 2024-08-20 16:31:09,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4872080.0, ans=0.0 2024-08-20 16:31:14,104 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 12 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-20 16:31:30,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4872180.0, ans=0.0 2024-08-20 16:31:34,567 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 27 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-20 16:31:39,797 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 13050, loss[loss=0.1015, beats_loss=0.01207, ecapa_loss=0.0001229, whisper_loss=0.08823, over 22186.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01058, ecapa_loss=0.0001394, whisper_loss=0.08937, over 3816302.32 frames. ], batch size: 88, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:31:58,886 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 23 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-20 16:31:59,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=4872280.0, ans=10.0 2024-08-20 16:32:03,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4872380.0, ans=0.2 2024-08-20 16:32:21,354 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.399e+01 2.559e+01 2.848e+01 5.989e+02, threshold=5.117e+01, percent-clipped=3.0 2024-08-20 16:32:58,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4872580.0, ans=0.125 2024-08-20 16:33:21,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4872680.0, ans=0.0 2024-08-20 16:33:26,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4872780.0, ans=0.125 2024-08-20 16:33:27,454 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 13100, loss[loss=0.109, beats_loss=0.008917, ecapa_loss=0.0001602, whisper_loss=0.09847, over 21573.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.0106, ecapa_loss=0.0001402, whisper_loss=0.08858, over 3779646.71 frames. ], batch size: 88, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:33:28,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4872780.0, ans=0.125 2024-08-20 16:33:49,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4872880.0, ans=0.125 2024-08-20 16:34:24,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4872980.0, ans=0.125 2024-08-20 16:35:23,458 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.63 vs. limit=10.0 2024-08-20 16:35:23,945 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 13150, loss[loss=0.07125, beats_loss=0.01229, ecapa_loss=0.0001391, whisper_loss=0.05757, over 20660.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01049, ecapa_loss=0.0001411, whisper_loss=0.08852, over 3772700.74 frames. ], batch size: 86, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:35:24,254 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 24 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-20 16:35:31,853 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 24 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-20 16:35:34,376 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.97 vs. limit=12.0 2024-08-20 16:35:54,791 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 16:36:04,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4873380.0, ans=0.125 2024-08-20 16:36:09,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4873480.0, ans=0.0 2024-08-20 16:36:10,112 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.306e+01 2.480e+01 2.703e+01 4.896e+01, threshold=4.961e+01, percent-clipped=0.0 2024-08-20 16:36:12,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4873480.0, ans=0.125 2024-08-20 16:36:15,355 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 30 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-20 16:36:20,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4873480.0, ans=0.0 2024-08-20 16:36:40,833 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 27 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 16:36:55,575 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 27 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-20 16:37:15,789 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 13200, loss[loss=0.08543, beats_loss=0.009383, ecapa_loss=0.000134, whisper_loss=0.07471, over 14905.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01049, ecapa_loss=0.0001407, whisper_loss=0.08929, over 3792258.69 frames. ], batch size: 57, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:37:16,866 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 24 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 16:37:19,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4873780.0, ans=0.0 2024-08-20 16:37:23,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4873780.0, ans=0.1 2024-08-20 16:37:27,486 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 18 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-20 16:37:41,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4873880.0, ans=0.125 2024-08-20 16:38:32,187 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 18 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-20 16:39:05,624 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 13250, loss[loss=0.09262, beats_loss=0.008785, ecapa_loss=0.0001416, whisper_loss=0.08242, over 16808.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01038, ecapa_loss=0.0001411, whisper_loss=0.08985, over 3797158.42 frames. ], batch size: 69, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:39:22,959 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.95 vs. limit=15.0 2024-08-20 16:39:47,284 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.277e+01 2.601e+01 3.009e+01 4.180e+01, threshold=5.201e+01, percent-clipped=0.0 2024-08-20 16:39:48,242 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 20 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-20 16:40:05,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4874480.0, ans=0.2 2024-08-20 16:40:18,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4874580.0, ans=0.0 2024-08-20 16:40:26,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4874580.0, ans=0.125 2024-08-20 16:40:31,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4874680.0, ans=0.125 2024-08-20 16:40:39,660 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 27 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-20 16:40:44,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4874680.0, ans=0.07 2024-08-20 16:40:47,660 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 16 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-20 16:40:48,148 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.40 vs. limit=15.0 2024-08-20 16:40:51,045 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 13300, loss[loss=0.1294, beats_loss=0.007663, ecapa_loss=0.0001696, whisper_loss=0.12, over 20720.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01039, ecapa_loss=0.0001422, whisper_loss=0.08923, over 3781466.09 frames. ], batch size: 82, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:40:52,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4874780.0, ans=0.125 2024-08-20 16:40:52,666 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.51 vs. limit=15.0 2024-08-20 16:41:11,222 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 16:41:22,492 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.87 vs. limit=10.0 2024-08-20 16:42:20,894 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 28 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-20 16:42:25,792 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 20 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-20 16:42:34,028 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 16 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-20 16:42:40,148 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 13350, loss[loss=0.1012, beats_loss=0.008165, ecapa_loss=0.0001359, whisper_loss=0.09167, over 19015.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01035, ecapa_loss=0.0001418, whisper_loss=0.08953, over 3759034.63 frames. ], batch size: 71, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:42:54,196 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 11 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-20 16:43:11,348 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 14 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 16:43:23,133 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.248e+01 2.436e+01 2.816e+01 2.858e+02, threshold=4.871e+01, percent-clipped=3.0 2024-08-20 16:43:51,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4875580.0, ans=0.125 2024-08-20 16:43:56,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4875580.0, ans=0.0 2024-08-20 16:43:58,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4875580.0, ans=0.2 2024-08-20 16:44:34,399 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 13400, loss[loss=0.1181, beats_loss=0.008328, ecapa_loss=0.0001204, whisper_loss=0.1085, over 20754.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0103, ecapa_loss=0.0001412, whisper_loss=0.09008, over 3761630.70 frames. ], batch size: 78, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:45:00,728 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.02 vs. limit=15.0 2024-08-20 16:45:08,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4875880.0, ans=0.125 2024-08-20 16:45:17,454 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 31 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-20 16:45:24,628 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 21 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-20 16:45:30,162 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-08-20 16:45:38,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4875980.0, ans=0.1 2024-08-20 16:46:27,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4876180.0, ans=0.125 2024-08-20 16:46:32,342 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-20 16:46:33,350 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 13450, loss[loss=0.1073, beats_loss=0.009404, ecapa_loss=0.0001565, whisper_loss=0.09634, over 22355.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01024, ecapa_loss=0.0001418, whisper_loss=0.09105, over 3779084.31 frames. ], batch size: 89, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:46:51,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4876280.0, ans=0.125 2024-08-20 16:47:20,307 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.341e+01 2.518e+01 2.794e+01 2.882e+02, threshold=5.035e+01, percent-clipped=1.0 2024-08-20 16:47:43,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4876580.0, ans=0.1 2024-08-20 16:48:03,564 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 24 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-20 16:48:22,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4876680.0, ans=0.07 2024-08-20 16:48:25,853 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 13500, loss[loss=0.1176, beats_loss=0.008564, ecapa_loss=0.0001176, whisper_loss=0.1079, over 18310.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01033, ecapa_loss=0.0001411, whisper_loss=0.09069, over 3805810.09 frames. ], batch size: 67, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:48:44,501 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 24 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 16:48:44,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4876780.0, ans=0.0 2024-08-20 16:48:54,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4876880.0, ans=0.125 2024-08-20 16:48:54,906 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.35 vs. limit=15.0 2024-08-20 16:49:01,895 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 13 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-20 16:49:39,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4877080.0, ans=0.0 2024-08-20 16:50:12,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4877180.0, ans=0.125 2024-08-20 16:50:14,788 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.20 vs. limit=15.0 2024-08-20 16:50:16,900 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2024-08-20 16:50:21,771 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 13550, loss[loss=0.1051, beats_loss=0.008525, ecapa_loss=0.0001585, whisper_loss=0.095, over 15051.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0103, ecapa_loss=0.0001399, whisper_loss=0.09044, over 3791957.02 frames. ], batch size: 58, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:50:40,703 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 12 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 16:50:43,727 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 29 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-20 16:50:57,932 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 22 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-20 16:50:58,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4877380.0, ans=0.07 2024-08-20 16:51:00,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4877380.0, ans=0.125 2024-08-20 16:51:10,946 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.306e+01 2.534e+01 2.808e+01 5.425e+01, threshold=5.068e+01, percent-clipped=1.0 2024-08-20 16:51:33,631 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.84 vs. limit=12.0 2024-08-20 16:51:37,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4877580.0, ans=0.125 2024-08-20 16:51:50,115 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 20 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-20 16:52:23,273 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 13600, loss[loss=0.09515, beats_loss=0.009854, ecapa_loss=0.0001248, whisper_loss=0.08405, over 17024.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01019, ecapa_loss=0.0001405, whisper_loss=0.0909, over 3783152.26 frames. ], batch size: 64, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:52:39,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4877780.0, ans=0.1 2024-08-20 16:52:41,402 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 24 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-20 16:52:44,331 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 25 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-20 16:52:48,838 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.55 vs. limit=22.5 2024-08-20 16:53:29,164 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 16 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-20 16:53:31,846 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 23 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-20 16:53:39,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4878080.0, ans=0.125 2024-08-20 16:53:53,813 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 23 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-20 16:54:21,020 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.88 vs. limit=6.0 2024-08-20 16:54:24,387 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 13650, loss[loss=0.09724, beats_loss=0.0125, ecapa_loss=0.000145, whisper_loss=0.08329, over 21260.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01027, ecapa_loss=0.0001405, whisper_loss=0.09056, over 3792822.25 frames. ], batch size: 92, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:54:49,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4878380.0, ans=0.1 2024-08-20 16:54:59,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4878380.0, ans=0.2 2024-08-20 16:55:08,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4878380.0, ans=0.2 2024-08-20 16:55:11,583 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.380e+01 2.594e+01 2.939e+01 1.944e+02, threshold=5.188e+01, percent-clipped=3.0 2024-08-20 16:55:36,620 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-20 16:56:03,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4878680.0, ans=0.125 2024-08-20 16:56:03,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4878680.0, ans=0.125 2024-08-20 16:56:07,901 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 31 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-20 16:56:08,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4878680.0, ans=0.1 2024-08-20 16:56:23,410 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 13700, loss[loss=0.08758, beats_loss=0.01033, ecapa_loss=0.000144, whisper_loss=0.07582, over 14126.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01026, ecapa_loss=0.0001414, whisper_loss=0.09076, over 3810657.31 frames. ], batch size: 57, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:56:32,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4878780.0, ans=0.0 2024-08-20 16:56:46,183 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 19 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 16:57:02,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=4878880.0, ans=10.0 2024-08-20 16:57:09,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4878980.0, ans=0.125 2024-08-20 16:57:18,806 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 18 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-20 16:57:27,974 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.26 vs. limit=22.5 2024-08-20 16:57:30,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4879080.0, ans=0.1 2024-08-20 16:58:17,615 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 13750, loss[loss=0.08589, beats_loss=0.0109, ecapa_loss=0.0001258, whisper_loss=0.07373, over 22145.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01031, ecapa_loss=0.0001416, whisper_loss=0.09037, over 3838110.45 frames. ], batch size: 88, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 16:58:49,845 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 16:59:00,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4879380.0, ans=0.0 2024-08-20 16:59:03,516 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.268e+01 2.560e+01 2.819e+01 5.576e+01, threshold=5.121e+01, percent-clipped=1.0 2024-08-20 16:59:12,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4879480.0, ans=0.0 2024-08-20 16:59:19,604 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-20 17:00:10,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4879680.0, ans=0.125 2024-08-20 17:00:15,482 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 13800, loss[loss=0.1144, beats_loss=0.009587, ecapa_loss=0.0001354, whisper_loss=0.1034, over 22886.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01029, ecapa_loss=0.0001414, whisper_loss=0.09009, over 3786849.55 frames. ], batch size: 93, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:00:24,123 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 17:00:32,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4879780.0, ans=0.125 2024-08-20 17:00:39,393 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 20 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-20 17:00:51,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4879880.0, ans=0.125 2024-08-20 17:00:57,439 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 17:01:03,216 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-488000.pt 2024-08-20 17:01:12,627 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 14 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-20 17:01:17,090 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 31 from LS+wenet, 33 from Vox, 29 fro AS 2024-08-20 17:01:24,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4880080.0, ans=0.0 2024-08-20 17:01:46,436 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 15 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-20 17:02:01,592 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 34 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-20 17:02:14,269 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 13850, loss[loss=0.1232, beats_loss=0.008743, ecapa_loss=0.0001368, whisper_loss=0.113, over 24290.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01035, ecapa_loss=0.0001408, whisper_loss=0.09006, over 3790491.95 frames. ], batch size: 91, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:02:18,790 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.61 vs. limit=15.0 2024-08-20 17:02:22,870 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 16 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 17:02:40,105 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.14 vs. limit=15.0 2024-08-20 17:02:46,890 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 32 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-20 17:03:00,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4880480.0, ans=0.0 2024-08-20 17:03:01,945 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.552e+01 2.225e+01 2.417e+01 2.709e+01 3.540e+01, threshold=4.834e+01, percent-clipped=0.0 2024-08-20 17:03:03,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4880480.0, ans=0.125 2024-08-20 17:03:16,699 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2024-08-20 17:03:18,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4880480.0, ans=0.0 2024-08-20 17:03:25,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4880580.0, ans=0.0 2024-08-20 17:03:39,298 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 23 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-20 17:03:56,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4880680.0, ans=0.0 2024-08-20 17:04:10,313 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 13900, loss[loss=0.1037, beats_loss=0.009609, ecapa_loss=0.0001546, whisper_loss=0.09253, over 18051.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01039, ecapa_loss=0.0001406, whisper_loss=0.0899, over 3780522.91 frames. ], batch size: 75, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:04:16,539 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 18 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-20 17:04:25,595 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.33 vs. limit=15.0 2024-08-20 17:04:37,747 WARNING [optim.py:496] (0/4) Scaling gradients by 0.016832223162055016, model_norm_threshold=48.33732604980469 2024-08-20 17:04:37,916 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.526e+05, grad_sumsq=7.526e+05, orig_rms_sq=1.000e+00 2024-08-20 17:04:48,369 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 28 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 17:05:41,534 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 21 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-20 17:05:41,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4881180.0, ans=0.0 2024-08-20 17:05:57,361 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 13950, loss[loss=0.1137, beats_loss=0.009229, ecapa_loss=0.0001286, whisper_loss=0.1032, over 16453.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01044, ecapa_loss=0.0001401, whisper_loss=0.09002, over 3796995.50 frames. ], batch size: 64, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:06:02,714 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 33 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-20 17:06:40,055 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.333e+01 2.602e+01 2.929e+01 2.872e+03, threshold=5.204e+01, percent-clipped=2.0 2024-08-20 17:06:49,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4881480.0, ans=0.125 2024-08-20 17:07:04,419 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 28 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-20 17:07:13,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4881580.0, ans=0.2 2024-08-20 17:07:24,974 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.38 vs. limit=6.0 2024-08-20 17:07:27,375 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 23 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-20 17:07:34,034 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 33 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-20 17:07:41,475 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.08 vs. limit=15.0 2024-08-20 17:07:43,295 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 14000, loss[loss=0.07461, beats_loss=0.01319, ecapa_loss=0.0001487, whisper_loss=0.05993, over 15851.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01045, ecapa_loss=0.0001408, whisper_loss=0.08961, over 3803847.92 frames. ], batch size: 70, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:08:00,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4881780.0, ans=0.2 2024-08-20 17:08:18,831 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 17:08:31,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4881980.0, ans=0.1 2024-08-20 17:08:45,005 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 26 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-20 17:08:49,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4881980.0, ans=0.125 2024-08-20 17:09:02,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4882080.0, ans=0.125 2024-08-20 17:09:05,361 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 28 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-20 17:09:17,359 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 24 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-20 17:09:31,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4882180.0, ans=0.2 2024-08-20 17:09:37,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4882280.0, ans=0.2 2024-08-20 17:09:38,436 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 14050, loss[loss=0.0534, beats_loss=0.0106, ecapa_loss=0.0001523, whisper_loss=0.04128, over 12765.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01043, ecapa_loss=0.0001403, whisper_loss=0.08993, over 3795993.22 frames. ], batch size: 52, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:09:53,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4882280.0, ans=0.125 2024-08-20 17:10:25,485 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.662e+01 2.229e+01 2.456e+01 2.749e+01 5.293e+01, threshold=4.913e+01, percent-clipped=1.0 2024-08-20 17:10:28,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4882480.0, ans=0.0 2024-08-20 17:10:31,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4882480.0, ans=0.125 2024-08-20 17:10:36,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4882480.0, ans=0.125 2024-08-20 17:10:41,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4882480.0, ans=0.0 2024-08-20 17:10:41,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4882480.0, ans=0.0 2024-08-20 17:10:55,602 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 36 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-20 17:10:55,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4882580.0, ans=0.2 2024-08-20 17:11:02,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4882580.0, ans=0.125 2024-08-20 17:11:08,491 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=15.0 2024-08-20 17:11:13,105 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 25 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-20 17:11:15,199 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 21 from LS+wenet, 10 from Vox, 34 fro AS 2024-08-20 17:11:22,183 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 18 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-20 17:11:36,463 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 14100, loss[loss=0.0959, beats_loss=0.01091, ecapa_loss=0.000152, whisper_loss=0.08347, over 14024.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01043, ecapa_loss=0.0001396, whisper_loss=0.09003, over 3782189.57 frames. ], batch size: 58, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:12:05,338 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.43 vs. limit=15.0 2024-08-20 17:12:08,179 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 32 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 17:12:42,720 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.55 vs. limit=15.0 2024-08-20 17:12:59,960 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-20 17:13:32,710 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 28 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-20 17:13:33,717 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 14150, loss[loss=0.1103, beats_loss=0.008435, ecapa_loss=0.0001499, whisper_loss=0.1003, over 20470.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0104, ecapa_loss=0.0001391, whisper_loss=0.08976, over 3776291.74 frames. ], batch size: 83, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:13:36,973 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 17:13:39,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4883280.0, ans=0.0 2024-08-20 17:13:40,865 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 19 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 17:13:46,626 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.62 vs. limit=15.0 2024-08-20 17:13:54,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4883380.0, ans=0.0 2024-08-20 17:14:01,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4883380.0, ans=0.0 2024-08-20 17:14:18,496 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.324e+01 2.479e+01 2.720e+01 4.062e+01, threshold=4.958e+01, percent-clipped=0.0 2024-08-20 17:14:22,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4883480.0, ans=0.0 2024-08-20 17:14:28,945 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 18 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-20 17:14:38,037 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 22 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-20 17:14:58,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4883580.0, ans=0.125 2024-08-20 17:15:25,199 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 14200, loss[loss=0.1123, beats_loss=0.008858, ecapa_loss=0.0001207, whisper_loss=0.1023, over 14564.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01042, ecapa_loss=0.000139, whisper_loss=0.09004, over 3762638.90 frames. ], batch size: 56, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:15:40,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4883780.0, ans=0.1 2024-08-20 17:15:40,945 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.57 vs. limit=22.5 2024-08-20 17:15:42,155 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 23 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-20 17:15:44,479 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 17 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-20 17:15:54,220 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.962e+00 2024-08-20 17:16:11,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4883980.0, ans=0.125 2024-08-20 17:16:11,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=4883980.0, ans=0.1 2024-08-20 17:16:14,061 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.49 vs. limit=6.0 2024-08-20 17:16:14,078 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.24 vs. limit=22.5 2024-08-20 17:16:34,047 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 27 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 17:16:50,621 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 17 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 17:16:55,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4884180.0, ans=0.125 2024-08-20 17:17:10,849 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 14250, loss[loss=0.1101, beats_loss=0.009875, ecapa_loss=0.0001555, whisper_loss=0.09863, over 17133.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01038, ecapa_loss=0.0001386, whisper_loss=0.09017, over 3765910.47 frames. ], batch size: 67, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:17:35,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4884380.0, ans=0.05 2024-08-20 17:17:37,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4884380.0, ans=0.2 2024-08-20 17:17:46,675 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.01 vs. limit=22.5 2024-08-20 17:17:48,024 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 29 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-20 17:17:50,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4884380.0, ans=0.025 2024-08-20 17:17:53,439 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.287e+01 2.497e+01 2.839e+01 4.280e+02, threshold=4.993e+01, percent-clipped=1.0 2024-08-20 17:18:14,233 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.66 vs. limit=15.0 2024-08-20 17:18:20,193 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-20 17:18:20,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4884580.0, ans=0.2 2024-08-20 17:18:46,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4884680.0, ans=0.125 2024-08-20 17:18:53,727 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 14300, loss[loss=0.1165, beats_loss=0.01056, ecapa_loss=0.000124, whisper_loss=0.1047, over 22870.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01036, ecapa_loss=0.0001379, whisper_loss=0.09077, over 3774826.43 frames. ], batch size: 90, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:19:09,710 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.83 vs. limit=6.0 2024-08-20 17:19:20,642 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 28 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 17:19:33,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4884880.0, ans=0.1 2024-08-20 17:19:35,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4884980.0, ans=0.0 2024-08-20 17:19:41,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4884980.0, ans=0.07 2024-08-20 17:19:50,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4884980.0, ans=0.125 2024-08-20 17:19:50,600 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.66 vs. limit=15.0 2024-08-20 17:19:56,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4885080.0, ans=0.025 2024-08-20 17:20:00,964 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 33 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 17:20:38,292 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 14350, loss[loss=0.08452, beats_loss=0.01106, ecapa_loss=0.0001501, whisper_loss=0.07196, over 22216.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01036, ecapa_loss=0.0001391, whisper_loss=0.09063, over 3761635.56 frames. ], batch size: 94, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:20:39,132 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 28 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-20 17:20:45,316 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 17 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-20 17:20:50,203 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.30 vs. limit=15.0 2024-08-20 17:20:51,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4885280.0, ans=0.2 2024-08-20 17:21:03,961 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 13 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 17:21:10,179 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2024-08-20 17:21:19,474 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.614e+01 2.424e+01 2.742e+01 3.115e+01 1.804e+02, threshold=5.484e+01, percent-clipped=2.0 2024-08-20 17:22:04,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4885680.0, ans=0.015 2024-08-20 17:22:17,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4885780.0, ans=0.125 2024-08-20 17:22:18,191 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.52 vs. limit=10.0 2024-08-20 17:22:18,668 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 14400, loss[loss=0.08219, beats_loss=0.01309, ecapa_loss=0.0001258, whisper_loss=0.06785, over 13889.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01047, ecapa_loss=0.0001385, whisper_loss=0.08983, over 3771258.76 frames. ], batch size: 57, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:22:31,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4885780.0, ans=0.1 2024-08-20 17:22:41,135 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 29 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-20 17:22:44,378 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 27 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-20 17:22:57,507 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 17 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-20 17:23:12,637 INFO [train_multi_KD3.py:845] (0/4) A total of 97 cuts. 26 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-20 17:23:16,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4886080.0, ans=0.1 2024-08-20 17:23:46,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4886180.0, ans=0.0 2024-08-20 17:23:57,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4886280.0, ans=0.125 2024-08-20 17:23:58,245 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 14450, loss[loss=0.09819, beats_loss=0.01051, ecapa_loss=0.0001379, whisper_loss=0.0863, over 22280.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01045, ecapa_loss=0.0001392, whisper_loss=0.08921, over 3792962.85 frames. ], batch size: 90, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:23:59,421 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.89 vs. limit=15.0 2024-08-20 17:24:41,906 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.260e+01 2.488e+01 2.810e+01 3.938e+01, threshold=4.976e+01, percent-clipped=0.0 2024-08-20 17:24:45,717 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.98 vs. limit=12.0 2024-08-20 17:25:07,859 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 34 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-20 17:25:28,801 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 20 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-20 17:25:36,757 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 17 from LS+wenet, 6 from Vox, 31 fro AS 2024-08-20 17:25:38,769 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 17 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-20 17:25:40,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=4886780.0, ans=0.1 2024-08-20 17:25:40,962 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 14500, loss[loss=0.08246, beats_loss=0.01258, ecapa_loss=0.0001036, whisper_loss=0.06885, over 20328.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0104, ecapa_loss=0.0001387, whisper_loss=0.08979, over 3813261.02 frames. ], batch size: 81, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:25:54,016 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 18 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-20 17:26:20,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4886980.0, ans=0.05 2024-08-20 17:27:25,744 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 14550, loss[loss=0.08928, beats_loss=0.008582, ecapa_loss=0.0001459, whisper_loss=0.07924, over 14625.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0103, ecapa_loss=0.0001392, whisper_loss=0.09055, over 3806747.72 frames. ], batch size: 56, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:27:46,681 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 22 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 17:28:03,347 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 28 from LS+wenet, 12 from Vox, 46 fro AS 2024-08-20 17:28:04,487 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 17 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-20 17:28:04,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4887380.0, ans=0.0 2024-08-20 17:28:08,436 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 17:28:11,720 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.302e+01 2.517e+01 2.766e+01 3.665e+01, threshold=5.035e+01, percent-clipped=0.0 2024-08-20 17:28:26,432 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2024-08-20 17:29:09,865 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 25 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-20 17:29:15,855 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 14600, loss[loss=0.1113, beats_loss=0.01062, ecapa_loss=0.0001079, whisper_loss=0.09957, over 21831.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01032, ecapa_loss=0.000139, whisper_loss=0.09056, over 3818480.76 frames. ], batch size: 84, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:30:01,500 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 25 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 17:30:32,567 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 14 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 17:30:52,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4888180.0, ans=0.2 2024-08-20 17:30:55,855 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.09 vs. limit=22.5 2024-08-20 17:30:59,511 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 20 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 17:31:02,581 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 14650, loss[loss=0.08926, beats_loss=0.01135, ecapa_loss=0.0001289, whisper_loss=0.07662, over 13914.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01031, ecapa_loss=0.0001396, whisper_loss=0.09089, over 3798259.86 frames. ], batch size: 55, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:31:12,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4888280.0, ans=0.125 2024-08-20 17:31:43,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4888380.0, ans=0.2 2024-08-20 17:31:48,222 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.228e+01 2.454e+01 2.785e+01 6.684e+01, threshold=4.907e+01, percent-clipped=2.0 2024-08-20 17:31:53,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4888480.0, ans=0.0 2024-08-20 17:32:00,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4888480.0, ans=0.125 2024-08-20 17:32:14,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4888580.0, ans=0.0 2024-08-20 17:32:31,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4888680.0, ans=0.0 2024-08-20 17:32:38,467 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 35 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-20 17:32:38,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4888680.0, ans=0.2 2024-08-20 17:32:51,508 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 14700, loss[loss=0.1089, beats_loss=0.01184, ecapa_loss=0.0001201, whisper_loss=0.09586, over 23995.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01025, ecapa_loss=0.0001404, whisper_loss=0.09139, over 3829623.72 frames. ], batch size: 97, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:33:01,105 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 17:33:03,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4888780.0, ans=0.0 2024-08-20 17:33:23,104 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 23 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-20 17:33:27,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4888880.0, ans=0.125 2024-08-20 17:33:33,552 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 19 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-20 17:33:38,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4888980.0, ans=0.0 2024-08-20 17:33:41,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4888980.0, ans=0.125 2024-08-20 17:33:44,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4888980.0, ans=0.1 2024-08-20 17:33:49,499 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 27 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-20 17:33:53,627 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 14 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-20 17:33:58,715 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.67 vs. limit=10.0 2024-08-20 17:34:08,263 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 28 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-20 17:34:23,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4889180.0, ans=0.0 2024-08-20 17:34:34,573 INFO [train_multi_KD3.py:1117] (0/4) Epoch 33, batch 14750, loss[loss=0.11, beats_loss=0.01102, ecapa_loss=0.0001019, whisper_loss=0.09798, over 24164.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01032, ecapa_loss=0.0001397, whisper_loss=0.0905, over 3823162.44 frames. ], batch size: 93, lr: 1.84e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:34:47,785 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 17 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-20 17:34:52,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4889280.0, ans=0.2 2024-08-20 17:34:56,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4889380.0, ans=0.1 2024-08-20 17:35:04,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4889380.0, ans=0.125 2024-08-20 17:35:04,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4889380.0, ans=0.125 2024-08-20 17:35:06,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4889380.0, ans=0.125 2024-08-20 17:35:17,774 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.334e+01 2.652e+01 2.948e+01 4.454e+01, threshold=5.304e+01, percent-clipped=0.0 2024-08-20 17:35:47,450 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 27 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-20 17:35:56,513 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 15 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-20 17:36:00,368 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-33.pt 2024-08-20 17:36:34,091 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 0, loss[loss=0.07894, beats_loss=0.009002, ecapa_loss=0.0001344, whisper_loss=0.06859, over 15423.00 frames. ], tot_loss[loss=0.07894, beats_loss=0.009002, ecapa_loss=0.0001344, whisper_loss=0.06859, over 15423.00 frames. ], batch size: 61, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:36:34,092 INFO [train_multi_KD3.py:1140] (0/4) Computing validation loss 2024-08-20 17:37:09,421 INFO [train_multi_KD3.py:1150] (0/4) Epoch 34, validation on ASR_libri: loss=0.2546, beats_loss=0, ecapa_loss=0.0005123, whisper_loss=0.2495, over 931116.00 frames. 2024-08-20 17:37:31,885 INFO [train_multi_KD3.py:1150] (0/4) Epoch 34, validation on SV_voxceleb1: loss=0.004, beats_loss=0, ecapa_loss=0.0004, whisper_loss=0, over 944235.00 frames. 2024-08-20 17:39:14,502 INFO [train_multi_KD3.py:1150] (0/4) Epoch 34, validation on AT_audioset: loss=0.02306, beats_loss=0.02306, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 17:39:14,505 INFO [train_multi_KD3.py:1156] (0/4) Maximum memory allocated so far is 32775MB 2024-08-20 17:39:19,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4889690.0, ans=0.125 2024-08-20 17:39:35,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4889690.0, ans=0.95 2024-08-20 17:39:37,190 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 14 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-20 17:39:52,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4889790.0, ans=0.1 2024-08-20 17:40:26,591 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-20 17:40:45,330 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 23 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 17:40:45,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4889990.0, ans=0.125 2024-08-20 17:41:01,548 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=15.0 2024-08-20 17:41:17,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4890190.0, ans=0.2 2024-08-20 17:41:19,467 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 50, loss[loss=0.112, beats_loss=0.008028, ecapa_loss=0.0001509, whisper_loss=0.1024, over 22273.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.008949, ecapa_loss=0.0001437, whisper_loss=0.09181, over 843693.50 frames. ], batch size: 87, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:41:29,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4890190.0, ans=0.125 2024-08-20 17:41:34,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4890190.0, ans=0.125 2024-08-20 17:41:49,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4890290.0, ans=0.1 2024-08-20 17:42:15,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4890390.0, ans=0.2 2024-08-20 17:42:30,970 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 26 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-20 17:42:33,752 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.429e+01 2.698e+01 2.927e+01 5.810e+01, threshold=5.396e+01, percent-clipped=1.0 2024-08-20 17:43:10,136 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 32 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 17:43:13,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4890590.0, ans=0.0 2024-08-20 17:43:18,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4890590.0, ans=0.125 2024-08-20 17:43:23,665 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 100, loss[loss=0.1031, beats_loss=0.008992, ecapa_loss=0.0001236, whisper_loss=0.09289, over 16776.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.008945, ecapa_loss=0.0001459, whisper_loss=0.0909, over 1517966.46 frames. ], batch size: 65, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:43:33,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4890690.0, ans=0.0 2024-08-20 17:43:41,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4890690.0, ans=0.125 2024-08-20 17:44:15,736 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.21 vs. limit=12.0 2024-08-20 17:44:27,438 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 15 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 17:45:09,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4891090.0, ans=0.125 2024-08-20 17:45:09,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4891090.0, ans=0.1 2024-08-20 17:45:30,984 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 150, loss[loss=0.1318, beats_loss=0.007111, ecapa_loss=0.0001058, whisper_loss=0.1236, over 14574.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.009142, ecapa_loss=0.0001425, whisper_loss=0.08998, over 1992407.42 frames. ], batch size: 51, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:45:36,380 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 18 from LS+wenet, 10 from Vox, 34 fro AS 2024-08-20 17:46:11,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4891290.0, ans=0.025 2024-08-20 17:46:15,496 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 21 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 17:46:19,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4891390.0, ans=0.07 2024-08-20 17:46:35,260 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.414e+01 2.624e+01 2.993e+01 4.090e+01, threshold=5.247e+01, percent-clipped=0.0 2024-08-20 17:46:50,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4891490.0, ans=0.0 2024-08-20 17:46:56,108 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.192e+05 2024-08-20 17:47:12,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4891590.0, ans=0.125 2024-08-20 17:47:15,762 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 200, loss[loss=0.1047, beats_loss=0.01113, ecapa_loss=0.0001465, whisper_loss=0.09208, over 21995.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.009486, ecapa_loss=0.0001424, whisper_loss=0.08982, over 2372753.31 frames. ], batch size: 90, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:47:17,159 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 17:47:42,637 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2024-08-20 17:47:45,814 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 13 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-20 17:47:58,534 INFO [train_multi_KD3.py:845] (0/4) A total of 49 cuts. 14 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-20 17:48:03,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4891890.0, ans=0.125 2024-08-20 17:48:09,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4891890.0, ans=0.125 2024-08-20 17:48:50,915 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 250, loss[loss=0.108, beats_loss=0.01137, ecapa_loss=0.0001181, whisper_loss=0.09546, over 23882.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.009707, ecapa_loss=0.0001423, whisper_loss=0.09023, over 2642756.40 frames. ], batch size: 92, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:48:56,058 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 24 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 17:49:11,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4892290.0, ans=0.0 2024-08-20 17:49:15,175 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 20 from LS+wenet, 9 from Vox, 24 fro AS 2024-08-20 17:49:23,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4892290.0, ans=0.2 2024-08-20 17:49:23,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4892290.0, ans=0.95 2024-08-20 17:49:28,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4892390.0, ans=0.125 2024-08-20 17:49:43,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4892390.0, ans=0.0 2024-08-20 17:49:48,357 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.222e+01 2.433e+01 2.766e+01 4.202e+01, threshold=4.866e+01, percent-clipped=0.0 2024-08-20 17:50:01,302 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-20 17:50:14,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4892590.0, ans=0.1 2024-08-20 17:50:23,578 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 300, loss[loss=0.08711, beats_loss=0.01182, ecapa_loss=0.0001929, whisper_loss=0.07336, over 21189.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.009947, ecapa_loss=0.0001417, whisper_loss=0.08934, over 2860015.52 frames. ], batch size: 94, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:50:28,158 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 26 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 17:50:35,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4892690.0, ans=0.125 2024-08-20 17:50:36,289 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2024-08-20 17:51:12,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4892890.0, ans=0.125 2024-08-20 17:51:55,178 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 350, loss[loss=0.08453, beats_loss=0.01032, ecapa_loss=0.0001386, whisper_loss=0.07283, over 16915.00 frames. ], tot_loss[loss=0.09977, beats_loss=0.0101, ecapa_loss=0.0001397, whisper_loss=0.08827, over 3040329.08 frames. ], batch size: 66, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:52:01,990 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.85 vs. limit=15.0 2024-08-20 17:52:23,741 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 17 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-20 17:52:27,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4893290.0, ans=0.1 2024-08-20 17:52:36,949 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 19 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-20 17:52:40,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4893390.0, ans=0.2 2024-08-20 17:52:48,839 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.726e+01 2.330e+01 2.558e+01 2.874e+01 1.855e+02, threshold=5.116e+01, percent-clipped=1.0 2024-08-20 17:53:01,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4893490.0, ans=0.1 2024-08-20 17:53:19,561 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 21 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 17:53:26,065 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 400, loss[loss=0.1131, beats_loss=0.006616, ecapa_loss=0.0001469, whisper_loss=0.105, over 18847.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01004, ecapa_loss=0.0001403, whisper_loss=0.08892, over 3163734.25 frames. ], batch size: 71, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:53:32,065 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-20 17:53:39,479 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 22 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-20 17:53:50,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4893790.0, ans=0.0 2024-08-20 17:53:59,823 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 22 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-20 17:54:13,316 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 17:54:22,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4893990.0, ans=0.0 2024-08-20 17:54:31,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4893990.0, ans=0.2 2024-08-20 17:54:34,001 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.42 vs. limit=15.0 2024-08-20 17:54:37,345 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 25 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-20 17:54:55,810 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 450, loss[loss=0.07834, beats_loss=0.01237, ecapa_loss=0.0001096, whisper_loss=0.06487, over 17428.00 frames. ], tot_loss[loss=0.09993, beats_loss=0.01018, ecapa_loss=0.0001403, whisper_loss=0.08835, over 3291946.33 frames. ], batch size: 68, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:55:01,807 WARNING [optim.py:496] (0/4) Scaling gradients by 0.014215901494026184, model_norm_threshold=51.16255569458008 2024-08-20 17:55:01,983 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.649e+06, grad_sumsq=5.014e+05, orig_rms_sq=3.288e+00 2024-08-20 17:55:29,174 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 16 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 17:55:35,903 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.32 vs. limit=15.0 2024-08-20 17:55:50,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4894490.0, ans=0.125 2024-08-20 17:55:51,328 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.729e+01 2.355e+01 2.533e+01 2.775e+01 3.599e+03, threshold=5.067e+01, percent-clipped=2.0 2024-08-20 17:56:11,982 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.91 vs. limit=15.0 2024-08-20 17:56:27,829 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 500, loss[loss=0.1114, beats_loss=0.01064, ecapa_loss=0.0001159, whisper_loss=0.09957, over 23922.00 frames. ], tot_loss[loss=0.09984, beats_loss=0.01018, ecapa_loss=0.0001409, whisper_loss=0.08825, over 3373167.15 frames. ], batch size: 91, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:56:31,791 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 21 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-20 17:56:33,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4894690.0, ans=0.0 2024-08-20 17:56:37,101 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.711e+05 2024-08-20 17:56:42,348 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 24 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-20 17:56:52,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4894790.0, ans=0.125 2024-08-20 17:56:58,182 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 19 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-20 17:57:05,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4894890.0, ans=0.0 2024-08-20 17:57:17,338 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 25 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-20 17:57:23,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4894990.0, ans=0.0 2024-08-20 17:57:31,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4894990.0, ans=0.125 2024-08-20 17:57:35,318 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.011e+05 2024-08-20 17:57:38,889 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 22 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-20 17:57:53,281 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 16 from LS+wenet, 25 from Vox, 17 fro AS 2024-08-20 17:57:55,133 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 22 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-20 17:57:58,251 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 550, loss[loss=0.08218, beats_loss=0.01151, ecapa_loss=0.0001198, whisper_loss=0.06947, over 21014.00 frames. ], tot_loss[loss=0.09973, beats_loss=0.01019, ecapa_loss=0.0001417, whisper_loss=0.08812, over 3463534.41 frames. ], batch size: 81, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:58:00,900 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-20 17:58:10,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4895190.0, ans=0.0 2024-08-20 17:58:28,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4895290.0, ans=0.125 2024-08-20 17:58:33,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4895390.0, ans=0.125 2024-08-20 17:58:37,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4895390.0, ans=0.125 2024-08-20 17:58:52,573 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.257e+01 2.490e+01 2.718e+01 3.602e+01, threshold=4.981e+01, percent-clipped=0.0 2024-08-20 17:59:07,760 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=15.24 vs. limit=15.0 2024-08-20 17:59:14,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4895590.0, ans=0.125 2024-08-20 17:59:19,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4895590.0, ans=0.09899494936611666 2024-08-20 17:59:27,725 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 600, loss[loss=0.07892, beats_loss=0.01302, ecapa_loss=0.0001291, whisper_loss=0.06461, over 18390.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01022, ecapa_loss=0.0001414, whisper_loss=0.08844, over 3530405.53 frames. ], batch size: 75, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 17:59:29,354 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 16 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 17:59:33,011 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-20 17:59:56,221 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 23 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-20 18:00:19,933 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.23 vs. limit=10.0 2024-08-20 18:00:21,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4895990.0, ans=0.125 2024-08-20 18:00:41,598 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 16 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-20 18:00:41,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4896090.0, ans=0.0 2024-08-20 18:00:55,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4896190.0, ans=0.125 2024-08-20 18:00:56,855 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 650, loss[loss=0.1047, beats_loss=0.01142, ecapa_loss=0.000148, whisper_loss=0.09181, over 22039.00 frames. ], tot_loss[loss=0.09965, beats_loss=0.01027, ecapa_loss=0.0001397, whisper_loss=0.08798, over 3586359.15 frames. ], batch size: 90, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:01:15,253 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 16 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-20 18:01:34,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4896390.0, ans=0.125 2024-08-20 18:01:37,851 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 20 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-20 18:01:47,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4896390.0, ans=0.125 2024-08-20 18:01:48,722 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 22 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-20 18:01:51,483 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.174e+01 2.397e+01 2.738e+01 4.303e+01, threshold=4.793e+01, percent-clipped=0.0 2024-08-20 18:02:07,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4896590.0, ans=0.1 2024-08-20 18:02:26,545 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 700, loss[loss=0.1184, beats_loss=0.008251, ecapa_loss=0.0001566, whisper_loss=0.1086, over 22386.00 frames. ], tot_loss[loss=0.09948, beats_loss=0.01031, ecapa_loss=0.0001397, whisper_loss=0.08776, over 3620846.60 frames. ], batch size: 90, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:02:34,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4896690.0, ans=0.125 2024-08-20 18:02:35,852 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 19 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 18:02:43,608 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.21 vs. limit=15.0 2024-08-20 18:03:04,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4896890.0, ans=0.2 2024-08-20 18:03:13,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4896890.0, ans=0.125 2024-08-20 18:03:15,956 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-20 18:03:28,249 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 25 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-20 18:03:49,377 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.27 vs. limit=15.0 2024-08-20 18:03:57,684 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 750, loss[loss=0.1001, beats_loss=0.01127, ecapa_loss=0.0001352, whisper_loss=0.08747, over 21169.00 frames. ], tot_loss[loss=0.09974, beats_loss=0.01027, ecapa_loss=0.0001385, whisper_loss=0.08808, over 3646594.33 frames. ], batch size: 86, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:03:59,684 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 20 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 18:04:11,960 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 19 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-20 18:04:19,537 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 25 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 18:04:50,445 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.248e+01 2.436e+01 2.663e+01 4.558e+01, threshold=4.873e+01, percent-clipped=0.0 2024-08-20 18:04:55,934 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.06 vs. limit=22.5 2024-08-20 18:05:04,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4897490.0, ans=0.2 2024-08-20 18:05:21,298 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 32 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-20 18:05:25,689 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 800, loss[loss=0.09397, beats_loss=0.01368, ecapa_loss=0.0001242, whisper_loss=0.07904, over 17747.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01029, ecapa_loss=0.0001378, whisper_loss=0.08852, over 3651283.96 frames. ], batch size: 72, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:05:35,196 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 25 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-20 18:05:41,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4897690.0, ans=0.125 2024-08-20 18:05:59,962 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.599e+01 2024-08-20 18:06:02,940 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 27 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-20 18:06:10,096 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.73 vs. limit=12.0 2024-08-20 18:06:16,525 WARNING [optim.py:496] (0/4) Scaling gradients by 0.034612394869327545, model_norm_threshold=48.72909927368164 2024-08-20 18:06:16,701 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.559e+05, grad_sumsq=3.559e+05, orig_rms_sq=1.000e+00 2024-08-20 18:06:22,963 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.050e+01 2024-08-20 18:06:31,384 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 20 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 18:06:52,582 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 850, loss[loss=0.1203, beats_loss=0.01149, ecapa_loss=0.0001542, whisper_loss=0.1073, over 18659.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01031, ecapa_loss=0.0001378, whisper_loss=0.0884, over 3676978.22 frames. ], batch size: 77, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:07:14,415 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 17 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 18:07:44,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=4898490.0, ans=0.02 2024-08-20 18:07:45,909 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.491e+01 2.263e+01 2.466e+01 2.785e+01 1.408e+03, threshold=4.933e+01, percent-clipped=1.0 2024-08-20 18:08:19,167 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 18:08:22,073 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 900, loss[loss=0.08365, beats_loss=0.008013, ecapa_loss=0.0001454, whisper_loss=0.07418, over 15751.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01033, ecapa_loss=0.0001381, whisper_loss=0.08848, over 3674499.60 frames. ], batch size: 61, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:08:28,018 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 30 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-20 18:08:37,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4898690.0, ans=0.125 2024-08-20 18:08:38,611 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 18 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-20 18:09:03,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4898890.0, ans=0.125 2024-08-20 18:09:03,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4898890.0, ans=0.5 2024-08-20 18:09:12,029 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.10 vs. limit=15.0 2024-08-20 18:09:23,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4898990.0, ans=0.125 2024-08-20 18:09:27,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=4898990.0, ans=0.05 2024-08-20 18:09:32,866 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 16 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-20 18:09:33,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4899090.0, ans=0.1 2024-08-20 18:09:42,641 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2024-08-20 18:09:44,802 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.56 vs. limit=15.0 2024-08-20 18:09:52,098 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 950, loss[loss=0.09374, beats_loss=0.01101, ecapa_loss=0.000144, whisper_loss=0.08129, over 22525.00 frames. ], tot_loss[loss=0.09984, beats_loss=0.01033, ecapa_loss=0.0001373, whisper_loss=0.08813, over 3736826.57 frames. ], batch size: 93, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:10:15,336 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 24 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-20 18:10:18,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4899290.0, ans=0.0 2024-08-20 18:10:32,705 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2024-08-20 18:10:40,104 WARNING [optim.py:496] (0/4) Scaling gradients by 0.03566131740808487, model_norm_threshold=49.32598114013672 2024-08-20 18:10:40,276 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.0.norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.124e+05, grad_sumsq=3.124e+05, orig_rms_sq=1.000e+00 2024-08-20 18:10:42,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4899490.0, ans=0.125 2024-08-20 18:10:43,562 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.227e+01 2.460e+01 2.712e+01 1.383e+03, threshold=4.920e+01, percent-clipped=1.0 2024-08-20 18:10:44,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4899490.0, ans=0.0 2024-08-20 18:11:01,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4899590.0, ans=0.125 2024-08-20 18:11:12,563 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.10 vs. limit=15.0 2024-08-20 18:11:14,325 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.275e-02 2024-08-20 18:11:20,425 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 1000, loss[loss=0.1155, beats_loss=0.009288, ecapa_loss=0.0001304, whisper_loss=0.1049, over 23293.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01026, ecapa_loss=0.0001371, whisper_loss=0.08877, over 3767227.20 frames. ], batch size: 90, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:11:25,483 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.32 vs. limit=22.5 2024-08-20 18:11:26,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4899690.0, ans=0.1 2024-08-20 18:11:32,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4899690.0, ans=0.09899494936611666 2024-08-20 18:11:46,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=4899790.0, ans=10.0 2024-08-20 18:12:05,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4899890.0, ans=0.125 2024-08-20 18:12:05,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4899890.0, ans=0.125 2024-08-20 18:12:06,059 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.48 vs. limit=22.5 2024-08-20 18:12:17,862 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 22 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-20 18:12:38,280 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.47 vs. limit=15.0 2024-08-20 18:12:50,726 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 1050, loss[loss=0.08993, beats_loss=0.008637, ecapa_loss=0.0001228, whisper_loss=0.08006, over 14512.00 frames. ], tot_loss[loss=0.09988, beats_loss=0.01023, ecapa_loss=0.0001376, whisper_loss=0.08827, over 3771591.27 frames. ], batch size: 55, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:13:01,298 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 21 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-20 18:13:24,828 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 20 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-20 18:13:43,083 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.230e+01 2.407e+01 2.713e+01 3.528e+01, threshold=4.815e+01, percent-clipped=0.0 2024-08-20 18:13:45,348 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 25 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-20 18:13:52,222 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 30 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-20 18:13:52,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4900490.0, ans=10.0 2024-08-20 18:13:52,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4900490.0, ans=0.0 2024-08-20 18:14:04,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4900590.0, ans=0.95 2024-08-20 18:14:09,119 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 24 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-20 18:14:10,795 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 18 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-20 18:14:17,683 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 1100, loss[loss=0.1072, beats_loss=0.008371, ecapa_loss=0.0001415, whisper_loss=0.09745, over 14836.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01019, ecapa_loss=0.000138, whisper_loss=0.08883, over 3736398.54 frames. ], batch size: 57, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:14:40,718 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.00 vs. limit=15.0 2024-08-20 18:14:43,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4900790.0, ans=0.0 2024-08-20 18:14:44,849 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.59 vs. limit=12.0 2024-08-20 18:14:53,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4900890.0, ans=0.125 2024-08-20 18:15:05,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4900890.0, ans=0.035 2024-08-20 18:15:10,125 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=15.0 2024-08-20 18:15:13,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4900990.0, ans=0.1 2024-08-20 18:15:13,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4900990.0, ans=0.09899494936611666 2024-08-20 18:15:19,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4900990.0, ans=0.125 2024-08-20 18:15:20,792 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 24 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-20 18:15:33,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4901090.0, ans=0.2 2024-08-20 18:15:40,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4901090.0, ans=0.125 2024-08-20 18:15:44,766 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 1150, loss[loss=0.09387, beats_loss=0.009594, ecapa_loss=0.0001226, whisper_loss=0.08305, over 15495.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01012, ecapa_loss=0.0001375, whisper_loss=0.08929, over 3735298.17 frames. ], batch size: 62, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:15:47,546 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 21 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-20 18:15:53,210 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2024-08-20 18:16:03,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4901290.0, ans=0.1 2024-08-20 18:16:38,484 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.288e+01 2.529e+01 2.855e+01 5.753e+01, threshold=5.059e+01, percent-clipped=2.0 2024-08-20 18:17:04,147 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 26 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-20 18:17:14,345 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 1200, loss[loss=0.111, beats_loss=0.01037, ecapa_loss=0.0001342, whisper_loss=0.09933, over 22274.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01016, ecapa_loss=0.000138, whisper_loss=0.08968, over 3768860.23 frames. ], batch size: 84, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:17:48,161 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.77 vs. limit=10.0 2024-08-20 18:18:17,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4901990.0, ans=0.125 2024-08-20 18:18:20,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4901990.0, ans=0.125 2024-08-20 18:18:43,503 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 1250, loss[loss=0.07737, beats_loss=0.01229, ecapa_loss=0.0001378, whisper_loss=0.0637, over 15810.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01022, ecapa_loss=0.0001377, whisper_loss=0.08925, over 3752555.62 frames. ], batch size: 65, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:18:44,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4902190.0, ans=0.125 2024-08-20 18:18:48,766 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 12 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-20 18:18:56,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4902190.0, ans=0.125 2024-08-20 18:19:01,334 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 26 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 18:19:03,146 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 23 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-20 18:19:09,104 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 32 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-20 18:19:35,948 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.263e+01 2.557e+01 2.836e+01 4.039e+01, threshold=5.115e+01, percent-clipped=0.0 2024-08-20 18:19:59,477 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 25 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-20 18:20:02,736 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 14 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-20 18:20:11,666 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 1300, loss[loss=0.08067, beats_loss=0.01288, ecapa_loss=0.0001625, whisper_loss=0.06617, over 16526.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01032, ecapa_loss=0.000137, whisper_loss=0.08876, over 3754705.34 frames. ], batch size: 69, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:20:35,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4902790.0, ans=0.0 2024-08-20 18:21:04,495 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 34 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-20 18:21:05,471 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.10 vs. limit=5.0 2024-08-20 18:21:36,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4903090.0, ans=0.125 2024-08-20 18:21:41,567 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 1350, loss[loss=0.1001, beats_loss=0.0115, ecapa_loss=0.0001101, whisper_loss=0.0875, over 22236.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01033, ecapa_loss=0.0001364, whisper_loss=0.08923, over 3754903.45 frames. ], batch size: 86, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:21:47,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4903190.0, ans=0.125 2024-08-20 18:22:16,876 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 23 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-20 18:22:24,379 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 15 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-20 18:22:29,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4903390.0, ans=0.125 2024-08-20 18:22:34,607 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.706e+01 2.265e+01 2.449e+01 2.794e+01 7.955e+01, threshold=4.899e+01, percent-clipped=1.0 2024-08-20 18:22:51,347 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.06 vs. limit=12.0 2024-08-20 18:23:00,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4903590.0, ans=0.0 2024-08-20 18:23:01,564 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 32 from LS+wenet, 12 from Vox, 43 fro AS 2024-08-20 18:23:03,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4903590.0, ans=0.1 2024-08-20 18:23:10,310 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 1400, loss[loss=0.1087, beats_loss=0.01059, ecapa_loss=0.0001436, whisper_loss=0.09664, over 21515.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01035, ecapa_loss=0.0001369, whisper_loss=0.08928, over 3765017.71 frames. ], batch size: 87, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:23:23,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4903690.0, ans=0.125 2024-08-20 18:23:26,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4903790.0, ans=0.125 2024-08-20 18:23:34,507 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 24 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-20 18:23:45,384 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 35 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-20 18:24:15,611 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 22 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 18:24:23,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4904090.0, ans=0.2 2024-08-20 18:24:29,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4904090.0, ans=0.0 2024-08-20 18:24:38,117 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 1450, loss[loss=0.09805, beats_loss=0.01251, ecapa_loss=9.907e-05, whisper_loss=0.08455, over 17565.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01033, ecapa_loss=0.0001375, whisper_loss=0.08889, over 3748146.49 frames. ], batch size: 67, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:24:44,359 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.17 vs. limit=15.0 2024-08-20 18:24:55,630 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-08-20 18:25:10,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4904290.0, ans=0.125 2024-08-20 18:25:14,661 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.06 vs. limit=22.5 2024-08-20 18:25:28,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4904390.0, ans=0.125 2024-08-20 18:25:32,172 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.688e+01 2.130e+01 2.447e+01 2.778e+01 4.776e+01, threshold=4.894e+01, percent-clipped=0.0 2024-08-20 18:26:07,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4904490.0, ans=0.125 2024-08-20 18:26:29,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4904590.0, ans=0.125 2024-08-20 18:26:32,486 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 1500, loss[loss=0.09764, beats_loss=0.0104, ecapa_loss=0.0001193, whisper_loss=0.08605, over 15507.00 frames. ], tot_loss[loss=0.09938, beats_loss=0.01043, ecapa_loss=0.0001357, whisper_loss=0.08758, over 3744873.01 frames. ], batch size: 59, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:26:35,306 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 19 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 18:26:36,164 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.27 vs. limit=22.5 2024-08-20 18:26:42,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4904690.0, ans=0.125 2024-08-20 18:26:49,550 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 18:26:51,916 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.35 vs. limit=15.0 2024-08-20 18:26:55,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4904790.0, ans=0.0 2024-08-20 18:27:08,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4904890.0, ans=0.125 2024-08-20 18:27:19,546 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 17 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-20 18:27:38,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4904990.0, ans=0.0 2024-08-20 18:27:55,884 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 20 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-20 18:27:56,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4905090.0, ans=0.0 2024-08-20 18:28:00,443 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.65 vs. limit=12.0 2024-08-20 18:28:04,913 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 1550, loss[loss=0.08649, beats_loss=0.01091, ecapa_loss=0.0001356, whisper_loss=0.07423, over 20190.00 frames. ], tot_loss[loss=0.09928, beats_loss=0.01044, ecapa_loss=0.0001354, whisper_loss=0.08749, over 3730903.02 frames. ], batch size: 80, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:28:18,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4905190.0, ans=0.0 2024-08-20 18:28:46,116 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 27 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 18:29:01,008 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.222e+01 2.378e+01 2.673e+01 8.948e+01, threshold=4.757e+01, percent-clipped=1.0 2024-08-20 18:29:11,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4905490.0, ans=0.125 2024-08-20 18:29:21,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4905590.0, ans=0.125 2024-08-20 18:29:22,758 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=8.799e+00 2024-08-20 18:29:31,606 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 28 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-20 18:29:38,724 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 1600, loss[loss=0.1057, beats_loss=0.008181, ecapa_loss=0.0001503, whisper_loss=0.09597, over 15639.00 frames. ], tot_loss[loss=0.09958, beats_loss=0.01037, ecapa_loss=0.0001361, whisper_loss=0.08784, over 3766444.92 frames. ], batch size: 61, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:29:43,474 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 35 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 18:29:52,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4905690.0, ans=0.125 2024-08-20 18:29:55,777 INFO [train_multi_KD3.py:845] (0/4) A total of 49 cuts. 21 from LS+wenet, 14 from Vox, 14 fro AS 2024-08-20 18:30:11,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-08-20 18:30:15,047 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-20 18:30:21,478 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 31 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-20 18:30:33,981 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0239420123398304, model_norm_threshold=47.56806564331055 2024-08-20 18:30:34,152 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.639e+05, grad_sumsq=5.639e+05, orig_rms_sq=1.000e+00 2024-08-20 18:31:02,015 WARNING [optim.py:496] (0/4) Scaling gradients by 0.05985227972269058, model_norm_threshold=47.56806564331055 2024-08-20 18:31:02,186 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.784e+04, grad_sumsq=6.784e+04, orig_rms_sq=1.000e+00 2024-08-20 18:31:10,630 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 1650, loss[loss=0.1233, beats_loss=0.009357, ecapa_loss=0.0001739, whisper_loss=0.1123, over 18844.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01029, ecapa_loss=0.0001357, whisper_loss=0.08835, over 3778473.82 frames. ], batch size: 74, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:31:18,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4906190.0, ans=0.125 2024-08-20 18:31:21,701 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 24 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-20 18:31:21,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4906190.0, ans=0.125 2024-08-20 18:31:28,978 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 15 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 18:31:30,562 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 29 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-20 18:31:34,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4906290.0, ans=0.125 2024-08-20 18:31:35,985 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 19 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 18:31:38,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4906290.0, ans=0.1 2024-08-20 18:31:43,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4906290.0, ans=0.1 2024-08-20 18:31:48,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4906390.0, ans=0.125 2024-08-20 18:32:04,560 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.417e+01 2.742e+01 3.192e+01 1.987e+03, threshold=5.484e+01, percent-clipped=2.0 2024-08-20 18:32:12,096 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 26 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-20 18:32:25,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4906590.0, ans=0.125 2024-08-20 18:32:30,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4906590.0, ans=0.1 2024-08-20 18:32:39,584 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 1700, loss[loss=0.1094, beats_loss=0.009497, ecapa_loss=0.000129, whisper_loss=0.0986, over 14440.00 frames. ], tot_loss[loss=0.09959, beats_loss=0.01031, ecapa_loss=0.0001357, whisper_loss=0.08793, over 3789132.53 frames. ], batch size: 55, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:32:45,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4906690.0, ans=0.09899494936611666 2024-08-20 18:32:46,800 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 16 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-20 18:33:05,450 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 20 from LS+wenet, 12 from Vox, 20 fro AS 2024-08-20 18:33:24,398 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 17 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-20 18:33:35,883 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 19 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-20 18:33:53,714 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 20 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-20 18:33:58,892 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 32 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-20 18:34:00,812 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 20 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 18:34:02,682 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 18 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-20 18:34:04,469 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 18 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 18:34:04,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4907090.0, ans=0.0 2024-08-20 18:34:10,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4907190.0, ans=0.125 2024-08-20 18:34:11,703 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 1750, loss[loss=0.1024, beats_loss=0.008242, ecapa_loss=0.0001675, whisper_loss=0.09246, over 17314.00 frames. ], tot_loss[loss=0.09957, beats_loss=0.01024, ecapa_loss=0.000136, whisper_loss=0.08797, over 3741118.68 frames. ], batch size: 70, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:34:24,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4907190.0, ans=0.125 2024-08-20 18:34:30,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4907290.0, ans=0.125 2024-08-20 18:34:33,189 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 23 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-20 18:34:44,717 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 28 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-20 18:34:51,849 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.46 vs. limit=15.0 2024-08-20 18:35:04,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4907490.0, ans=0.125 2024-08-20 18:35:05,589 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.254e+01 2.590e+01 2.933e+01 3.656e+02, threshold=5.181e+01, percent-clipped=1.0 2024-08-20 18:35:30,723 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.34 vs. limit=15.0 2024-08-20 18:35:40,830 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 1800, loss[loss=0.1003, beats_loss=0.01167, ecapa_loss=0.0001114, whisper_loss=0.08756, over 15624.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01018, ecapa_loss=0.0001357, whisper_loss=0.08895, over 3735695.72 frames. ], batch size: 60, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:36:08,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4907790.0, ans=0.125 2024-08-20 18:36:23,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4907890.0, ans=0.125 2024-08-20 18:36:36,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4907990.0, ans=0.125 2024-08-20 18:36:38,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4907990.0, ans=0.0 2024-08-20 18:36:40,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4907990.0, ans=0.0 2024-08-20 18:36:40,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4907990.0, ans=0.0 2024-08-20 18:37:02,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4908090.0, ans=0.2 2024-08-20 18:37:09,308 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 1850, loss[loss=0.08867, beats_loss=0.01282, ecapa_loss=0.0001358, whisper_loss=0.07449, over 15319.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01022, ecapa_loss=0.0001355, whisper_loss=0.0894, over 3748528.12 frames. ], batch size: 65, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:37:29,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4908290.0, ans=0.1 2024-08-20 18:37:31,013 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 24 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 18:37:31,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4908290.0, ans=0.125 2024-08-20 18:37:51,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4908390.0, ans=0.0 2024-08-20 18:38:04,908 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.237e+01 2.438e+01 2.771e+01 3.802e+01, threshold=4.876e+01, percent-clipped=0.0 2024-08-20 18:38:11,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4908490.0, ans=0.0 2024-08-20 18:38:30,343 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 36 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-20 18:38:40,219 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 15 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-20 18:38:43,220 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 1900, loss[loss=0.08598, beats_loss=0.0112, ecapa_loss=0.0001269, whisper_loss=0.07352, over 18216.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01026, ecapa_loss=0.0001352, whisper_loss=0.0887, over 3743035.55 frames. ], batch size: 71, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:38:43,785 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 18:38:45,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4908690.0, ans=0.125 2024-08-20 18:39:17,807 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 18:39:21,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4908890.0, ans=0.2 2024-08-20 18:40:00,127 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.63 vs. limit=15.0 2024-08-20 18:40:04,671 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 19 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-20 18:40:18,052 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 1950, loss[loss=0.1179, beats_loss=0.008702, ecapa_loss=0.0001299, whisper_loss=0.1079, over 18472.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01022, ecapa_loss=0.000136, whisper_loss=0.08882, over 3730745.18 frames. ], batch size: 71, lr: 1.81e-03, grad_scale: 1.152921504606847e+18 2024-08-20 18:40:22,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4909190.0, ans=0.125 2024-08-20 18:40:50,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4909290.0, ans=0.125 2024-08-20 18:41:01,292 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 16 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 18:41:09,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4909390.0, ans=0.0 2024-08-20 18:41:14,305 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.290e+01 2.558e+01 2.755e+01 1.117e+02, threshold=5.115e+01, percent-clipped=1.0 2024-08-20 18:41:20,264 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 19 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-20 18:41:38,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4909590.0, ans=0.125 2024-08-20 18:41:50,629 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 2000, loss[loss=0.09648, beats_loss=0.007367, ecapa_loss=0.0001283, whisper_loss=0.08783, over 16030.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01022, ecapa_loss=0.0001352, whisper_loss=0.08916, over 3742844.63 frames. ], batch size: 56, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:41:55,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4909690.0, ans=0.2 2024-08-20 18:42:18,064 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 13 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-20 18:42:40,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4909890.0, ans=0.125 2024-08-20 18:42:45,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4909990.0, ans=0.125 2024-08-20 18:42:50,921 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.588e-01 2024-08-20 18:42:52,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4909990.0, ans=0.125 2024-08-20 18:42:55,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4909990.0, ans=0.125 2024-08-20 18:43:03,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4910090.0, ans=0.125 2024-08-20 18:43:04,098 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.49 vs. limit=15.0 2024-08-20 18:43:08,344 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 15 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-20 18:43:10,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4910090.0, ans=0.0 2024-08-20 18:43:11,098 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.34 vs. limit=15.0 2024-08-20 18:43:20,486 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 2050, loss[loss=0.1152, beats_loss=0.006918, ecapa_loss=0.000168, whisper_loss=0.1066, over 14927.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01023, ecapa_loss=0.0001351, whisper_loss=0.08887, over 3717481.32 frames. ], batch size: 59, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:43:25,896 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.46 vs. limit=22.5 2024-08-20 18:43:42,815 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 24 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-20 18:43:53,785 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 22 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-20 18:43:59,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4910390.0, ans=0.1 2024-08-20 18:44:11,733 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 15 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-20 18:44:14,929 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.216e+01 2.451e+01 2.843e+01 3.787e+02, threshold=4.902e+01, percent-clipped=2.0 2024-08-20 18:44:21,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4910490.0, ans=0.125 2024-08-20 18:44:40,675 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2024-08-20 18:44:48,998 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 2100, loss[loss=0.1054, beats_loss=0.01234, ecapa_loss=0.0001434, whisper_loss=0.09164, over 13358.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01024, ecapa_loss=0.0001359, whisper_loss=0.08901, over 3681671.56 frames. ], batch size: 54, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:46:05,078 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.92 vs. limit=15.0 2024-08-20 18:46:09,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4911090.0, ans=0.2 2024-08-20 18:46:11,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4911090.0, ans=0.125 2024-08-20 18:46:17,945 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 2150, loss[loss=0.09424, beats_loss=0.0116, ecapa_loss=9.762e-05, whisper_loss=0.08166, over 13312.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01031, ecapa_loss=0.0001358, whisper_loss=0.08877, over 3669437.96 frames. ], batch size: 50, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:46:21,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4911190.0, ans=0.125 2024-08-20 18:46:23,598 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-20 18:46:46,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4911290.0, ans=0.125 2024-08-20 18:46:57,981 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.42 vs. limit=15.0 2024-08-20 18:47:09,446 WARNING [optim.py:496] (0/4) Scaling gradients by 0.05905143544077873, model_norm_threshold=49.024410247802734 2024-08-20 18:47:09,930 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.632e+04, grad_sumsq=6.177e+06, orig_rms_sq=1.074e-02 2024-08-20 18:47:12,871 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.265e+01 2.540e+01 2.946e+01 8.302e+02, threshold=5.079e+01, percent-clipped=3.0 2024-08-20 18:47:23,265 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.61 vs. limit=15.0 2024-08-20 18:47:46,282 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 2200, loss[loss=0.0877, beats_loss=0.01009, ecapa_loss=0.0001345, whisper_loss=0.07626, over 14894.00 frames. ], tot_loss[loss=0.09959, beats_loss=0.01032, ecapa_loss=0.0001354, whisper_loss=0.08791, over 3665664.09 frames. ], batch size: 60, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:47:49,616 WARNING [optim.py:496] (0/4) Scaling gradients by 0.06644881516695023, model_norm_threshold=50.791358947753906 2024-08-20 18:47:49,774 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.846e+04, grad_sumsq=7.846e+04, orig_rms_sq=1.000e+00 2024-08-20 18:47:57,047 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 16 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 18:48:14,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4911790.0, ans=0.125 2024-08-20 18:48:29,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4911890.0, ans=0.0 2024-08-20 18:48:42,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4911990.0, ans=0.125 2024-08-20 18:48:52,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4911990.0, ans=0.125 2024-08-20 18:48:59,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4912090.0, ans=0.0 2024-08-20 18:49:07,041 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 27 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-20 18:49:17,686 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 2250, loss[loss=0.1174, beats_loss=0.01097, ecapa_loss=0.0001383, whisper_loss=0.105, over 20382.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01033, ecapa_loss=0.0001357, whisper_loss=0.08907, over 3686815.80 frames. ], batch size: 79, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:49:23,379 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 18:49:32,406 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 18:49:40,053 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.61 vs. limit=22.5 2024-08-20 18:49:56,374 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 22 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-20 18:50:10,750 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-20 18:50:14,378 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.207e+01 2.453e+01 2.665e+01 7.644e+02, threshold=4.907e+01, percent-clipped=1.0 2024-08-20 18:50:16,328 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 26 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-20 18:50:22,818 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.08 vs. limit=15.0 2024-08-20 18:50:37,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4912590.0, ans=0.0 2024-08-20 18:50:48,184 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 2300, loss[loss=0.1031, beats_loss=0.01138, ecapa_loss=0.0001362, whisper_loss=0.09038, over 15247.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01037, ecapa_loss=0.0001368, whisper_loss=0.08947, over 3695468.91 frames. ], batch size: 59, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:51:15,597 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 24 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-20 18:51:17,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4912790.0, ans=0.125 2024-08-20 18:51:17,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4912790.0, ans=0.1 2024-08-20 18:51:18,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4912790.0, ans=0.95 2024-08-20 18:51:49,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4912990.0, ans=0.0 2024-08-20 18:51:49,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4912990.0, ans=0.0 2024-08-20 18:51:50,740 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 14 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-20 18:51:52,448 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 17 from LS+wenet, 8 from Vox, 27 fro AS 2024-08-20 18:51:54,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4912990.0, ans=0.1 2024-08-20 18:51:57,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4913090.0, ans=0.1 2024-08-20 18:52:08,158 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.35 vs. limit=10.0 2024-08-20 18:52:17,057 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 2350, loss[loss=0.1119, beats_loss=0.01024, ecapa_loss=0.0001058, whisper_loss=0.1006, over 15832.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01039, ecapa_loss=0.000137, whisper_loss=0.08989, over 3705869.48 frames. ], batch size: 59, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:52:38,711 WARNING [optim.py:496] (0/4) Scaling gradients by 0.04937309771776199, model_norm_threshold=49.067115783691406 2024-08-20 18:52:38,867 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.547e+05, grad_sumsq=4.707e+04, orig_rms_sq=3.286e+00 2024-08-20 18:53:05,222 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.57 vs. limit=12.0 2024-08-20 18:53:09,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4913390.0, ans=0.1 2024-08-20 18:53:14,670 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.360e+01 2.620e+01 2.900e+01 9.938e+02, threshold=5.241e+01, percent-clipped=3.0 2024-08-20 18:53:16,792 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 23 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-20 18:53:18,697 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 27 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-20 18:53:31,059 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 29 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-20 18:53:34,409 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 25 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-20 18:53:40,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4913590.0, ans=0.125 2024-08-20 18:53:49,346 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 2400, loss[loss=0.0829, beats_loss=0.01403, ecapa_loss=0.0001122, whisper_loss=0.06775, over 22583.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01034, ecapa_loss=0.0001374, whisper_loss=0.09018, over 3736908.50 frames. ], batch size: 92, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:54:00,353 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.13 vs. limit=15.0 2024-08-20 18:54:01,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4913690.0, ans=0.125 2024-08-20 18:54:47,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4913990.0, ans=0.0 2024-08-20 18:55:07,484 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=15.0 2024-08-20 18:55:09,038 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 18:55:13,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4914090.0, ans=0.1 2024-08-20 18:55:14,246 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.61 vs. limit=22.5 2024-08-20 18:55:17,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4914190.0, ans=0.2 2024-08-20 18:55:19,318 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 2450, loss[loss=0.1052, beats_loss=0.01066, ecapa_loss=0.000132, whisper_loss=0.09321, over 23591.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01024, ecapa_loss=0.0001371, whisper_loss=0.09055, over 3748147.69 frames. ], batch size: 94, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:55:48,390 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 26 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-20 18:55:52,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4914290.0, ans=0.0 2024-08-20 18:56:01,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4914390.0, ans=0.2 2024-08-20 18:56:07,651 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 26 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-20 18:56:07,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4914390.0, ans=0.125 2024-08-20 18:56:16,541 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.269e+01 2.495e+01 2.810e+01 4.376e+01, threshold=4.990e+01, percent-clipped=0.0 2024-08-20 18:56:53,400 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 2500, loss[loss=0.1158, beats_loss=0.007847, ecapa_loss=0.0001541, whisper_loss=0.1065, over 21533.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01031, ecapa_loss=0.0001367, whisper_loss=0.09023, over 3785535.34 frames. ], batch size: 84, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:57:02,679 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.45 vs. limit=22.5 2024-08-20 18:57:17,451 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.53 vs. limit=10.0 2024-08-20 18:57:29,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4914890.0, ans=0.125 2024-08-20 18:57:29,688 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.15 vs. limit=15.0 2024-08-20 18:57:36,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4914890.0, ans=0.0 2024-08-20 18:57:58,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4914990.0, ans=0.0 2024-08-20 18:58:03,372 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.047e+05 2024-08-20 18:58:21,725 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 2550, loss[loss=0.1006, beats_loss=0.008843, ecapa_loss=0.0001568, whisper_loss=0.09022, over 23204.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01032, ecapa_loss=0.0001372, whisper_loss=0.08977, over 3782448.40 frames. ], batch size: 92, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:58:27,228 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 29 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-20 18:59:01,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4915390.0, ans=0.125 2024-08-20 18:59:03,629 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 18:59:19,251 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.591e+01 2.358e+01 2.574e+01 2.752e+01 5.119e+01, threshold=5.148e+01, percent-clipped=1.0 2024-08-20 18:59:25,593 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 18:59:31,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4915490.0, ans=0.125 2024-08-20 18:59:38,852 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 28 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-20 18:59:55,381 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 2600, loss[loss=0.09239, beats_loss=0.009824, ecapa_loss=0.0001614, whisper_loss=0.08095, over 17129.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01025, ecapa_loss=0.0001367, whisper_loss=0.09031, over 3805633.71 frames. ], batch size: 73, lr: 1.81e-03, grad_scale: 5.764607523034235e+17 2024-08-20 18:59:57,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4915690.0, ans=0.125 2024-08-20 19:00:07,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4915690.0, ans=0.1 2024-08-20 19:00:08,029 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=22.5 2024-08-20 19:00:28,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4915790.0, ans=0.95 2024-08-20 19:00:31,624 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 29 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-20 19:00:40,956 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 18 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-20 19:01:03,307 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 22 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-20 19:01:30,875 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 2650, loss[loss=0.07747, beats_loss=0.01101, ecapa_loss=0.0001676, whisper_loss=0.06478, over 19337.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01015, ecapa_loss=0.0001385, whisper_loss=0.09088, over 3815746.55 frames. ], batch size: 78, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:01:36,049 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 19:01:41,432 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 35 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-20 19:01:48,733 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-08-20 19:01:49,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4916290.0, ans=0.0 2024-08-20 19:01:49,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4916290.0, ans=0.07 2024-08-20 19:02:02,525 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0383872389793396, model_norm_threshold=51.48301696777344 2024-08-20 19:02:02,680 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.236e+05, grad_sumsq=2.236e+05, orig_rms_sq=1.000e+00 2024-08-20 19:02:04,585 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 27 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 19:02:06,931 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 12 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-20 19:02:07,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4916390.0, ans=0.125 2024-08-20 19:02:20,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4916390.0, ans=0.125 2024-08-20 19:02:22,413 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 21 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-20 19:02:25,488 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.307e+01 2.525e+01 3.012e+01 1.341e+03, threshold=5.051e+01, percent-clipped=2.0 2024-08-20 19:02:25,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4916490.0, ans=0.0 2024-08-20 19:02:31,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4916490.0, ans=0.125 2024-08-20 19:02:32,939 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 22 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 19:02:33,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4916490.0, ans=0.125 2024-08-20 19:02:35,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4916490.0, ans=0.125 2024-08-20 19:02:40,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=4916590.0, ans=0.02 2024-08-20 19:02:50,935 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.86 vs. limit=10.0 2024-08-20 19:02:59,056 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 2700, loss[loss=0.1366, beats_loss=0.008012, ecapa_loss=0.0001219, whisper_loss=0.1274, over 16608.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01011, ecapa_loss=0.0001392, whisper_loss=0.09085, over 3774672.40 frames. ], batch size: 61, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:03:03,526 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2024-08-20 19:03:25,285 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 29 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-20 19:04:10,927 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2024-08-20 19:04:27,975 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 2750, loss[loss=0.0776, beats_loss=0.008321, ecapa_loss=9.126e-05, whisper_loss=0.06837, over 17031.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01014, ecapa_loss=0.0001385, whisper_loss=0.09114, over 3821269.16 frames. ], batch size: 59, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:05:11,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4917390.0, ans=0.125 2024-08-20 19:05:12,218 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.10 vs. limit=15.0 2024-08-20 19:05:28,450 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.332e+01 2.555e+01 2.897e+01 4.432e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-20 19:05:43,928 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.14 vs. limit=15.0 2024-08-20 19:05:53,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4917590.0, ans=0.125 2024-08-20 19:05:55,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4917590.0, ans=0.125 2024-08-20 19:06:03,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4917690.0, ans=0.0 2024-08-20 19:06:04,499 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 2800, loss[loss=0.1088, beats_loss=0.01014, ecapa_loss=0.0001043, whisper_loss=0.09762, over 23431.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01016, ecapa_loss=0.0001375, whisper_loss=0.09147, over 3827609.78 frames. ], batch size: 88, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:06:38,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4917890.0, ans=0.0 2024-08-20 19:06:42,216 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 14 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-20 19:07:32,817 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 2850, loss[loss=0.09452, beats_loss=0.01064, ecapa_loss=0.0001203, whisper_loss=0.08268, over 20762.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01022, ecapa_loss=0.0001367, whisper_loss=0.09055, over 3779816.38 frames. ], batch size: 79, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:07:33,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4918190.0, ans=0.2 2024-08-20 19:07:36,341 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 25 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-20 19:07:38,951 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2024-08-20 19:07:42,508 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 17 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-20 19:07:49,521 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 31 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-20 19:08:24,420 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 21 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 19:08:29,383 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.343e+01 2.572e+01 2.859e+01 3.545e+01, threshold=5.143e+01, percent-clipped=0.0 2024-08-20 19:08:46,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4918590.0, ans=0.125 2024-08-20 19:09:03,471 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 2900, loss[loss=0.07928, beats_loss=0.01176, ecapa_loss=0.0001011, whisper_loss=0.06651, over 12723.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01021, ecapa_loss=0.0001371, whisper_loss=0.09129, over 3785158.74 frames. ], batch size: 50, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:09:07,350 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 21 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-20 19:09:10,970 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 30 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-20 19:09:20,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4918790.0, ans=0.125 2024-08-20 19:09:23,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4918790.0, ans=0.025 2024-08-20 19:09:26,042 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.38 vs. limit=22.5 2024-08-20 19:09:54,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4918890.0, ans=0.0 2024-08-20 19:10:00,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4918990.0, ans=0.0 2024-08-20 19:10:04,241 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2024-08-20 19:10:32,715 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 2950, loss[loss=0.08405, beats_loss=0.01085, ecapa_loss=0.0001365, whisper_loss=0.07183, over 13732.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01025, ecapa_loss=0.000138, whisper_loss=0.09076, over 3781285.94 frames. ], batch size: 53, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:10:36,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4919190.0, ans=0.125 2024-08-20 19:10:36,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4919190.0, ans=0.125 2024-08-20 19:10:38,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4919190.0, ans=0.0 2024-08-20 19:11:06,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4919290.0, ans=0.0 2024-08-20 19:11:27,682 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 21 from LS+wenet, 31 from Vox, 41 fro AS 2024-08-20 19:11:30,767 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.232e+01 2.550e+01 2.898e+01 2.799e+02, threshold=5.099e+01, percent-clipped=2.0 2024-08-20 19:12:05,890 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 3000, loss[loss=0.1097, beats_loss=0.009165, ecapa_loss=0.0001676, whisper_loss=0.0989, over 19962.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01027, ecapa_loss=0.0001387, whisper_loss=0.09047, over 3801043.05 frames. ], batch size: 83, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:12:05,892 INFO [train_multi_KD3.py:1140] (0/4) Computing validation loss 2024-08-20 19:12:42,523 INFO [train_multi_KD3.py:1150] (0/4) Epoch 34, validation on ASR_libri: loss=0.2544, beats_loss=0, ecapa_loss=0.000513, whisper_loss=0.2492, over 931116.00 frames. 2024-08-20 19:13:06,232 INFO [train_multi_KD3.py:1150] (0/4) Epoch 34, validation on SV_voxceleb1: loss=0.003961, beats_loss=0, ecapa_loss=0.0003961, whisper_loss=0, over 944235.00 frames. 2024-08-20 19:14:43,119 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.5361, 2.2597, 2.0785, 1.9764], device='cuda:0') 2024-08-20 19:14:44,841 INFO [train_multi_KD3.py:1150] (0/4) Epoch 34, validation on AT_audioset: loss=0.02302, beats_loss=0.02302, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 19:14:44,846 INFO [train_multi_KD3.py:1156] (0/4) Maximum memory allocated so far is 32775MB 2024-08-20 19:15:00,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4919790.0, ans=0.2 2024-08-20 19:15:11,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4919790.0, ans=0.125 2024-08-20 19:15:17,175 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.61 vs. limit=6.0 2024-08-20 19:15:25,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4919890.0, ans=0.0 2024-08-20 19:15:31,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4919890.0, ans=0.1 2024-08-20 19:15:38,247 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-492000.pt 2024-08-20 19:15:56,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4920090.0, ans=0.1 2024-08-20 19:16:08,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4920090.0, ans=0.125 2024-08-20 19:16:15,031 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 3050, loss[loss=0.1174, beats_loss=0.009536, ecapa_loss=0.0001478, whisper_loss=0.1064, over 21822.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01038, ecapa_loss=0.0001379, whisper_loss=0.09, over 3821363.42 frames. ], batch size: 88, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:16:20,864 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.41 vs. limit=15.0 2024-08-20 19:16:26,908 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 36 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-20 19:16:43,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4920290.0, ans=0.0 2024-08-20 19:16:55,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4920390.0, ans=10.0 2024-08-20 19:17:00,732 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=15.0 2024-08-20 19:17:03,827 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 16 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 19:17:09,406 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.284e+01 2.588e+01 2.897e+01 2.080e+02, threshold=5.176e+01, percent-clipped=1.0 2024-08-20 19:17:16,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4920490.0, ans=0.2 2024-08-20 19:17:17,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4920490.0, ans=0.0 2024-08-20 19:17:38,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4920590.0, ans=0.125 2024-08-20 19:17:41,203 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 3100, loss[loss=0.06929, beats_loss=0.009785, ecapa_loss=0.0001766, whisper_loss=0.05774, over 14326.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0104, ecapa_loss=0.0001381, whisper_loss=0.08995, over 3801322.47 frames. ], batch size: 60, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:17:41,438 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-20 19:17:41,649 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 19:17:44,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4920690.0, ans=0.1 2024-08-20 19:17:58,557 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 31 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-20 19:18:17,773 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.99 vs. limit=10.0 2024-08-20 19:18:23,985 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 12 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-20 19:18:24,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4920890.0, ans=0.1 2024-08-20 19:19:02,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=4921090.0, ans=0.1 2024-08-20 19:19:11,048 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 3150, loss[loss=0.1, beats_loss=0.009597, ecapa_loss=0.0001443, whisper_loss=0.089, over 22478.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01039, ecapa_loss=0.0001391, whisper_loss=0.09069, over 3849989.78 frames. ], batch size: 93, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:19:11,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4921190.0, ans=0.125 2024-08-20 19:19:52,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4921390.0, ans=0.0 2024-08-20 19:20:06,005 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.207e+01 2.457e+01 2.685e+01 3.583e+01, threshold=4.914e+01, percent-clipped=0.0 2024-08-20 19:20:11,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4921490.0, ans=0.0 2024-08-20 19:20:20,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4921590.0, ans=0.1 2024-08-20 19:20:23,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4921590.0, ans=0.2 2024-08-20 19:20:35,081 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 25 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-20 19:20:38,065 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 3200, loss[loss=0.1152, beats_loss=0.009105, ecapa_loss=0.0001658, whisper_loss=0.1044, over 22152.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01041, ecapa_loss=0.0001396, whisper_loss=0.09059, over 3830306.16 frames. ], batch size: 91, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:20:40,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4921690.0, ans=0.125 2024-08-20 19:20:59,822 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.42 vs. limit=15.0 2024-08-20 19:21:07,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4921790.0, ans=0.125 2024-08-20 19:21:38,936 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.112e-02 2024-08-20 19:21:44,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4921990.0, ans=0.1 2024-08-20 19:21:55,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4922090.0, ans=0.125 2024-08-20 19:22:03,073 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 3250, loss[loss=0.07396, beats_loss=0.01385, ecapa_loss=0.0001233, whisper_loss=0.05888, over 13444.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01049, ecapa_loss=0.000139, whisper_loss=0.08976, over 3830410.59 frames. ], batch size: 56, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:22:03,972 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.47 vs. limit=6.0 2024-08-20 19:22:12,946 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.12 vs. limit=22.5 2024-08-20 19:22:15,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4922190.0, ans=0.125 2024-08-20 19:22:23,990 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 31 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-20 19:22:28,059 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=22.5 2024-08-20 19:22:44,854 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.350e+01 2024-08-20 19:22:49,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4922390.0, ans=0.125 2024-08-20 19:22:56,412 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.200e+01 2.511e+01 2.776e+01 3.425e+01, threshold=5.022e+01, percent-clipped=0.0 2024-08-20 19:23:13,595 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 34 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-20 19:23:25,556 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.28 vs. limit=15.0 2024-08-20 19:23:28,486 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 3300, loss[loss=0.07072, beats_loss=0.0111, ecapa_loss=0.0001416, whisper_loss=0.05821, over 15533.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.0001398, whisper_loss=0.0904, over 3830626.53 frames. ], batch size: 63, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:23:34,844 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.60 vs. limit=10.0 2024-08-20 19:23:51,143 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 19:23:57,117 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 30 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-20 19:24:19,862 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-20 19:24:25,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4922990.0, ans=0.125 2024-08-20 19:24:34,944 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 20 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-20 19:24:43,834 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 21 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-20 19:24:55,510 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 3350, loss[loss=0.09669, beats_loss=0.009204, ecapa_loss=0.000139, whisper_loss=0.08609, over 17394.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01043, ecapa_loss=0.0001393, whisper_loss=0.08999, over 3774376.80 frames. ], batch size: 68, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:25:04,382 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-20 19:25:18,568 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 19:25:20,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4923290.0, ans=0.125 2024-08-20 19:25:27,294 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 24 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-20 19:25:32,561 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 20 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 19:25:36,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4923390.0, ans=0.125 2024-08-20 19:25:41,042 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 15 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 19:25:49,366 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.220e+01 2.419e+01 2.738e+01 3.918e+01, threshold=4.837e+01, percent-clipped=0.0 2024-08-20 19:26:06,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4923590.0, ans=0.125 2024-08-20 19:26:10,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4923590.0, ans=0.0 2024-08-20 19:26:21,889 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 3400, loss[loss=0.07785, beats_loss=0.01121, ecapa_loss=0.0001322, whisper_loss=0.06532, over 19848.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01041, ecapa_loss=0.0001399, whisper_loss=0.08938, over 3808097.83 frames. ], batch size: 79, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:26:29,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4923690.0, ans=0.2 2024-08-20 19:26:29,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4923690.0, ans=0.125 2024-08-20 19:26:39,977 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.38 vs. limit=10.0 2024-08-20 19:27:11,262 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 30 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-20 19:27:11,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4923890.0, ans=0.0 2024-08-20 19:27:16,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4923990.0, ans=0.125 2024-08-20 19:27:32,589 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.725e-02 2024-08-20 19:27:41,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4924090.0, ans=0.125 2024-08-20 19:27:48,644 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 3450, loss[loss=0.0862, beats_loss=0.01291, ecapa_loss=0.0001163, whisper_loss=0.07213, over 21938.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01041, ecapa_loss=0.0001397, whisper_loss=0.08944, over 3824611.04 frames. ], batch size: 88, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:28:22,158 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 21 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-20 19:28:42,622 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.395e+01 2.734e+01 3.067e+01 2.505e+02, threshold=5.467e+01, percent-clipped=4.0 2024-08-20 19:28:46,399 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 14 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-20 19:28:48,203 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 23 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-20 19:28:52,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4924490.0, ans=0.125 2024-08-20 19:29:03,486 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 36 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 19:29:08,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4924590.0, ans=0.2 2024-08-20 19:29:12,469 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.04 vs. limit=15.0 2024-08-20 19:29:15,162 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 3500, loss[loss=0.08384, beats_loss=0.01028, ecapa_loss=0.0001776, whisper_loss=0.07178, over 20029.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01047, ecapa_loss=0.0001391, whisper_loss=0.08923, over 3854185.33 frames. ], batch size: 87, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:29:23,549 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 18 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-20 19:29:32,762 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 19:29:36,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4924790.0, ans=0.125 2024-08-20 19:29:45,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4924790.0, ans=0.125 2024-08-20 19:29:56,376 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=15.0 2024-08-20 19:30:08,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4924990.0, ans=0.2 2024-08-20 19:30:33,692 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 19:30:42,349 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 3550, loss[loss=0.09341, beats_loss=0.009961, ecapa_loss=0.0001129, whisper_loss=0.08232, over 18765.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01036, ecapa_loss=0.0001401, whisper_loss=0.08979, over 3833358.58 frames. ], batch size: 70, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:30:42,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4925190.0, ans=0.125 2024-08-20 19:30:58,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4925290.0, ans=0.125 2024-08-20 19:31:02,026 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 26 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 19:31:05,936 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 19:31:07,395 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 34 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-20 19:31:21,870 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.22 vs. limit=15.0 2024-08-20 19:31:34,029 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.24 vs. limit=22.5 2024-08-20 19:31:36,714 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.776e+01 2.289e+01 2.465e+01 2.729e+01 3.504e+01, threshold=4.930e+01, percent-clipped=0.0 2024-08-20 19:31:58,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4925590.0, ans=0.2 2024-08-20 19:32:09,075 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 3600, loss[loss=0.1157, beats_loss=0.009122, ecapa_loss=0.0001505, whisper_loss=0.105, over 16127.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01039, ecapa_loss=0.0001389, whisper_loss=0.08999, over 3819221.00 frames. ], batch size: 67, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:32:11,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4925690.0, ans=0.1 2024-08-20 19:32:18,872 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.07 vs. limit=12.0 2024-08-20 19:32:26,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4925790.0, ans=0.125 2024-08-20 19:32:33,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4925790.0, ans=0.0 2024-08-20 19:32:35,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4925790.0, ans=0.125 2024-08-20 19:32:46,654 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 35 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-20 19:33:07,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4925990.0, ans=0.0 2024-08-20 19:33:24,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4926090.0, ans=0.0 2024-08-20 19:33:33,199 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 30 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-20 19:33:34,574 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 3650, loss[loss=0.1157, beats_loss=0.008737, ecapa_loss=0.0001469, whisper_loss=0.1055, over 20628.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01038, ecapa_loss=0.0001391, whisper_loss=0.0894, over 3841029.63 frames. ], batch size: 81, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:33:36,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4926190.0, ans=0.1 2024-08-20 19:33:40,626 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 19:33:55,112 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 27 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-20 19:34:03,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4926290.0, ans=0.1 2024-08-20 19:34:07,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4926290.0, ans=0.0 2024-08-20 19:34:28,497 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.676e+01 2.198e+01 2.421e+01 2.738e+01 4.465e+02, threshold=4.843e+01, percent-clipped=1.0 2024-08-20 19:34:41,375 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 29 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-20 19:34:51,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4926590.0, ans=0.125 2024-08-20 19:35:01,487 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 3700, loss[loss=0.08979, beats_loss=0.01084, ecapa_loss=0.0001023, whisper_loss=0.07793, over 19115.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01046, ecapa_loss=0.000138, whisper_loss=0.08911, over 3833442.45 frames. ], batch size: 73, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:35:13,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4926690.0, ans=0.125 2024-08-20 19:35:22,844 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2024-08-20 19:35:26,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4926790.0, ans=0.125 2024-08-20 19:35:44,223 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2024-08-20 19:36:09,407 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.74 vs. limit=12.0 2024-08-20 19:36:11,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4927090.0, ans=0.1 2024-08-20 19:36:29,325 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 3750, loss[loss=0.1001, beats_loss=0.009934, ecapa_loss=0.0001231, whisper_loss=0.0889, over 21274.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01044, ecapa_loss=0.0001382, whisper_loss=0.08937, over 3826709.75 frames. ], batch size: 81, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:36:30,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4927190.0, ans=0.0 2024-08-20 19:36:30,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4927190.0, ans=0.0 2024-08-20 19:36:35,038 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 26 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-20 19:36:35,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4927190.0, ans=0.2 2024-08-20 19:36:37,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4927190.0, ans=0.125 2024-08-20 19:36:41,717 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 19:36:44,403 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.15 vs. limit=15.0 2024-08-20 19:37:14,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4927390.0, ans=0.125 2024-08-20 19:37:19,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4927490.0, ans=0.1 2024-08-20 19:37:22,987 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.233e+01 2.505e+01 2.774e+01 5.527e+01, threshold=5.010e+01, percent-clipped=2.0 2024-08-20 19:37:29,956 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 26 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 19:37:45,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4927590.0, ans=0.2 2024-08-20 19:37:49,478 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 23 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 19:37:53,642 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 16 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-20 19:37:55,092 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 3800, loss[loss=0.09374, beats_loss=0.01043, ecapa_loss=0.0001596, whisper_loss=0.08172, over 13203.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01042, ecapa_loss=0.000138, whisper_loss=0.08972, over 3812816.13 frames. ], batch size: 55, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:37:59,126 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 24 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-20 19:38:02,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4927690.0, ans=0.0 2024-08-20 19:38:05,012 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2024-08-20 19:38:07,993 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 26 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 19:38:26,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4927790.0, ans=0.125 2024-08-20 19:38:36,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4927890.0, ans=0.0 2024-08-20 19:38:43,916 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 33 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 19:38:46,870 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.83 vs. limit=10.0 2024-08-20 19:39:01,321 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 21 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-20 19:39:21,971 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 3850, loss[loss=0.09281, beats_loss=0.009726, ecapa_loss=0.0001437, whisper_loss=0.08165, over 18050.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01034, ecapa_loss=0.000139, whisper_loss=0.09058, over 3821263.64 frames. ], batch size: 71, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:39:24,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4928190.0, ans=0.125 2024-08-20 19:39:29,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4928190.0, ans=0.0 2024-08-20 19:39:53,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4928290.0, ans=0.1 2024-08-20 19:40:08,597 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 22 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-20 19:40:11,955 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 26 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 19:40:16,895 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.695e+01 2.370e+01 2.629e+01 2.963e+01 4.700e+01, threshold=5.257e+01, percent-clipped=0.0 2024-08-20 19:40:24,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4928490.0, ans=0.1 2024-08-20 19:40:43,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=4928590.0, ans=15.0 2024-08-20 19:40:49,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4928690.0, ans=0.125 2024-08-20 19:40:50,860 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 3900, loss[loss=0.09992, beats_loss=0.01065, ecapa_loss=0.0001553, whisper_loss=0.08772, over 17088.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01037, ecapa_loss=0.0001384, whisper_loss=0.08999, over 3791170.34 frames. ], batch size: 71, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:40:51,024 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 13 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 19:40:52,342 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 13 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 19:40:52,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4928690.0, ans=0.07 2024-08-20 19:41:02,948 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 19:41:13,677 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 19:41:26,046 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 28 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 19:41:46,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4928990.0, ans=0.125 2024-08-20 19:41:56,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4928990.0, ans=0.125 2024-08-20 19:42:08,642 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 27 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-20 19:42:16,731 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 3950, loss[loss=0.09094, beats_loss=0.01395, ecapa_loss=0.000151, whisper_loss=0.07547, over 22020.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.0001387, whisper_loss=0.09044, over 3816200.50 frames. ], batch size: 94, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:42:24,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4929190.0, ans=0.0 2024-08-20 19:42:38,732 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 21 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 19:42:47,715 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 20 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-20 19:43:00,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4929390.0, ans=0.125 2024-08-20 19:43:11,937 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.377e+01 2.625e+01 2.908e+01 3.824e+01, threshold=5.250e+01, percent-clipped=0.0 2024-08-20 19:43:20,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4929490.0, ans=0.2 2024-08-20 19:43:23,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4929490.0, ans=0.1 2024-08-20 19:43:23,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4929490.0, ans=0.125 2024-08-20 19:43:44,290 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 4000, loss[loss=0.101, beats_loss=0.01123, ecapa_loss=0.0001425, whisper_loss=0.08839, over 18900.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01036, ecapa_loss=0.0001389, whisper_loss=0.09108, over 3865706.86 frames. ], batch size: 79, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:44:13,190 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 36 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-20 19:44:13,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4929790.0, ans=0.5 2024-08-20 19:44:15,232 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 21 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-20 19:44:15,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4929790.0, ans=0.0 2024-08-20 19:44:16,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4929790.0, ans=0.0 2024-08-20 19:45:14,023 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 4050, loss[loss=0.09348, beats_loss=0.01034, ecapa_loss=0.00014, whisper_loss=0.08174, over 21429.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01028, ecapa_loss=0.0001396, whisper_loss=0.09171, over 3883793.72 frames. ], batch size: 87, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:45:21,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4930190.0, ans=0.0 2024-08-20 19:45:45,097 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.29 vs. limit=15.0 2024-08-20 19:46:02,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4930390.0, ans=0.125 2024-08-20 19:46:04,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=4930390.0, ans=0.05 2024-08-20 19:46:08,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4930490.0, ans=0.125 2024-08-20 19:46:11,279 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.272e+01 2.496e+01 2.748e+01 3.675e+01, threshold=4.991e+01, percent-clipped=0.0 2024-08-20 19:46:18,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4930490.0, ans=0.1 2024-08-20 19:46:27,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4930590.0, ans=0.1 2024-08-20 19:46:31,409 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 31 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-20 19:46:44,569 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 4100, loss[loss=0.1013, beats_loss=0.009506, ecapa_loss=0.0001523, whisper_loss=0.09029, over 22611.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01029, ecapa_loss=0.0001393, whisper_loss=0.09198, over 3881712.73 frames. ], batch size: 88, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:46:56,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4930690.0, ans=0.125 2024-08-20 19:47:25,326 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 32 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-20 19:47:43,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4930990.0, ans=0.2 2024-08-20 19:47:48,982 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2024-08-20 19:47:50,386 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 31 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-20 19:47:52,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4930990.0, ans=0.0 2024-08-20 19:47:59,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4931090.0, ans=0.05 2024-08-20 19:48:12,582 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 4150, loss[loss=0.1067, beats_loss=0.01031, ecapa_loss=0.0001356, whisper_loss=0.09505, over 22678.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01025, ecapa_loss=0.0001397, whisper_loss=0.09175, over 3896923.09 frames. ], batch size: 89, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:48:20,641 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 27 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-20 19:48:29,269 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 24 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-20 19:48:59,279 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 24 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-20 19:49:09,967 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.357e+01 2.563e+01 2.804e+01 4.051e+01, threshold=5.127e+01, percent-clipped=0.0 2024-08-20 19:49:27,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4931590.0, ans=0.125 2024-08-20 19:49:41,345 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.33 vs. limit=15.0 2024-08-20 19:49:42,298 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 4200, loss[loss=0.1106, beats_loss=0.01101, ecapa_loss=0.0001221, whisper_loss=0.09839, over 16918.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01031, ecapa_loss=0.0001403, whisper_loss=0.09123, over 3876632.09 frames. ], batch size: 66, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:49:42,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4931690.0, ans=0.125 2024-08-20 19:49:48,474 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.54 vs. limit=15.0 2024-08-20 19:50:06,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4931790.0, ans=0.0 2024-08-20 19:50:06,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4931790.0, ans=0.125 2024-08-20 19:50:08,291 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 16 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-20 19:50:12,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4931790.0, ans=0.125 2024-08-20 19:50:38,725 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 19:50:52,919 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 19 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-20 19:51:00,733 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.43 vs. limit=12.0 2024-08-20 19:51:11,292 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 4250, loss[loss=0.1078, beats_loss=0.008831, ecapa_loss=0.0001475, whisper_loss=0.09745, over 22540.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01036, ecapa_loss=0.0001409, whisper_loss=0.0905, over 3862104.08 frames. ], batch size: 92, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:51:15,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4932190.0, ans=0.125 2024-08-20 19:51:21,686 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.42 vs. limit=12.0 2024-08-20 19:51:30,056 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-20 19:51:31,286 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.61 vs. limit=22.5 2024-08-20 19:51:58,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4932390.0, ans=0.125 2024-08-20 19:52:08,803 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.607e+01 2.265e+01 2.580e+01 2.966e+01 3.429e+02, threshold=5.160e+01, percent-clipped=3.0 2024-08-20 19:52:09,016 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-20 19:52:10,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4932490.0, ans=0.0 2024-08-20 19:52:12,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4932490.0, ans=0.125 2024-08-20 19:52:18,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.91 vs. limit=5.0 2024-08-20 19:52:21,240 WARNING [optim.py:496] (0/4) Scaling gradients by 0.022736379876732826, model_norm_threshold=51.5983772277832 2024-08-20 19:52:21,399 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.129e+05, grad_sumsq=7.129e+05, orig_rms_sq=1.000e+00 2024-08-20 19:52:24,711 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 22 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-20 19:52:26,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4932590.0, ans=0.0 2024-08-20 19:52:35,034 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 24 from LS+wenet, 11 from Vox, 20 fro AS 2024-08-20 19:52:39,735 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 4300, loss[loss=0.08294, beats_loss=0.009256, ecapa_loss=0.0001825, whisper_loss=0.07185, over 16961.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01043, ecapa_loss=0.0001406, whisper_loss=0.09049, over 3863660.71 frames. ], batch size: 72, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:52:44,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4932690.0, ans=0.0 2024-08-20 19:52:56,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4932790.0, ans=0.0 2024-08-20 19:53:06,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4932790.0, ans=0.0 2024-08-20 19:54:08,030 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 4350, loss[loss=0.09193, beats_loss=0.01048, ecapa_loss=0.0001491, whisper_loss=0.07996, over 20544.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01035, ecapa_loss=0.0001396, whisper_loss=0.09122, over 3852639.36 frames. ], batch size: 85, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:54:12,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4933190.0, ans=0.1 2024-08-20 19:54:19,399 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.38 vs. limit=15.0 2024-08-20 19:54:20,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4933190.0, ans=0.125 2024-08-20 19:54:56,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4933390.0, ans=0.0 2024-08-20 19:54:57,754 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 20 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-20 19:55:03,252 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 24 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-20 19:55:04,292 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.347e+01 2.590e+01 2.980e+01 2.269e+03, threshold=5.180e+01, percent-clipped=1.0 2024-08-20 19:55:19,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4933590.0, ans=0.2 2024-08-20 19:55:22,331 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.70 vs. limit=15.0 2024-08-20 19:55:25,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4933590.0, ans=0.1 2024-08-20 19:55:25,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4933590.0, ans=0.05 2024-08-20 19:55:32,111 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=15.0 2024-08-20 19:55:35,530 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 4400, loss[loss=0.06645, beats_loss=0.01139, ecapa_loss=0.0001719, whisper_loss=0.05334, over 12996.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01038, ecapa_loss=0.0001389, whisper_loss=0.09064, over 3858384.61 frames. ], batch size: 56, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:55:50,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4933790.0, ans=0.0 2024-08-20 19:55:59,887 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 20 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-20 19:56:06,979 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 15 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-20 19:56:30,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4933990.0, ans=0.2 2024-08-20 19:56:39,991 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 14 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-20 19:56:44,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4933990.0, ans=0.1 2024-08-20 19:56:48,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4934090.0, ans=0.95 2024-08-20 19:57:01,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4934090.0, ans=0.125 2024-08-20 19:57:05,979 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 4450, loss[loss=0.08691, beats_loss=0.0132, ecapa_loss=0.0001448, whisper_loss=0.07226, over 21584.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0104, ecapa_loss=0.0001387, whisper_loss=0.09021, over 3866378.46 frames. ], batch size: 91, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:57:43,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4934390.0, ans=0.125 2024-08-20 19:57:50,813 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.65 vs. limit=22.5 2024-08-20 19:57:56,476 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 33 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-20 19:57:58,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4934490.0, ans=0.0 2024-08-20 19:58:01,031 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.622e+01 2.339e+01 2.669e+01 2.965e+01 4.502e+01, threshold=5.338e+01, percent-clipped=0.0 2024-08-20 19:58:09,872 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 14 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-20 19:58:13,306 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=9.870e+00 2024-08-20 19:58:26,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=4934590.0, ans=0.05 2024-08-20 19:58:30,646 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 4500, loss[loss=0.09133, beats_loss=0.01178, ecapa_loss=0.0001156, whisper_loss=0.0784, over 12933.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01038, ecapa_loss=0.000139, whisper_loss=0.09054, over 3867828.48 frames. ], batch size: 50, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:58:38,216 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 17 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-20 19:58:48,628 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2024-08-20 19:59:06,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4934890.0, ans=0.0 2024-08-20 19:59:17,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4934890.0, ans=0.09899494936611666 2024-08-20 19:59:54,418 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 4550, loss[loss=0.1109, beats_loss=0.01034, ecapa_loss=0.0001317, whisper_loss=0.09927, over 13385.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01041, ecapa_loss=0.0001374, whisper_loss=0.08995, over 3815573.26 frames. ], batch size: 49, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 19:59:58,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4935190.0, ans=0.1 2024-08-20 20:00:05,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4935190.0, ans=0.2 2024-08-20 20:00:09,705 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 26 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 20:00:31,199 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 20:00:39,403 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 27 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-20 20:00:43,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4935390.0, ans=0.125 2024-08-20 20:00:50,286 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.956e+01 2.294e+01 2.455e+01 2.830e+01 3.953e+01, threshold=4.911e+01, percent-clipped=0.0 2024-08-20 20:01:22,202 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 4600, loss[loss=0.08419, beats_loss=0.009654, ecapa_loss=0.0001612, whisper_loss=0.07292, over 13064.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01044, ecapa_loss=0.0001369, whisper_loss=0.08988, over 3816673.15 frames. ], batch size: 53, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:01:22,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4935690.0, ans=0.1 2024-08-20 20:01:22,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4935690.0, ans=0.125 2024-08-20 20:01:24,622 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 22 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-20 20:01:38,820 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.59 vs. limit=12.0 2024-08-20 20:02:06,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4935890.0, ans=0.125 2024-08-20 20:02:19,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4935990.0, ans=0.0 2024-08-20 20:02:20,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4935990.0, ans=0.07 2024-08-20 20:02:21,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4935990.0, ans=0.0 2024-08-20 20:02:30,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4936090.0, ans=0.125 2024-08-20 20:02:33,298 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 26 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-20 20:02:47,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4936190.0, ans=0.0 2024-08-20 20:02:48,901 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 4650, loss[loss=0.05666, beats_loss=0.01347, ecapa_loss=0.0001481, whisper_loss=0.04171, over 12255.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01052, ecapa_loss=0.0001369, whisper_loss=0.08903, over 3801119.64 frames. ], batch size: 52, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:03:04,949 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-08-20 20:03:32,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4936390.0, ans=0.125 2024-08-20 20:03:43,917 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.237e+01 2.528e+01 2.827e+01 5.668e+01, threshold=5.055e+01, percent-clipped=2.0 2024-08-20 20:04:00,582 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.72 vs. limit=15.0 2024-08-20 20:04:13,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4936690.0, ans=0.1 2024-08-20 20:04:15,050 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 4700, loss[loss=0.1065, beats_loss=0.01101, ecapa_loss=0.0001167, whisper_loss=0.09431, over 14738.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01057, ecapa_loss=0.0001375, whisper_loss=0.08852, over 3826939.50 frames. ], batch size: 58, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:04:22,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4936690.0, ans=0.125 2024-08-20 20:04:23,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4936690.0, ans=0.125 2024-08-20 20:04:23,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4936690.0, ans=0.125 2024-08-20 20:04:30,485 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.599e-01 2024-08-20 20:04:42,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4936790.0, ans=0.125 2024-08-20 20:04:42,312 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.33 vs. limit=15.0 2024-08-20 20:05:02,707 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 28 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-20 20:05:06,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4936990.0, ans=0.125 2024-08-20 20:05:19,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4936990.0, ans=0.0 2024-08-20 20:05:35,679 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.77 vs. limit=15.0 2024-08-20 20:05:39,983 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 4750, loss[loss=0.1013, beats_loss=0.01182, ecapa_loss=0.0001191, whisper_loss=0.08832, over 23007.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01056, ecapa_loss=0.0001382, whisper_loss=0.08891, over 3851228.98 frames. ], batch size: 92, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:05:49,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4937190.0, ans=0.125 2024-08-20 20:06:00,060 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 20:06:37,057 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.283e+01 2.561e+01 2.830e+01 4.199e+01, threshold=5.121e+01, percent-clipped=0.0 2024-08-20 20:06:37,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4937490.0, ans=0.125 2024-08-20 20:06:39,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4937490.0, ans=0.0 2024-08-20 20:06:49,775 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 18 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-20 20:07:07,693 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 12 from LS+wenet, 24 from Vox, 18 fro AS 2024-08-20 20:07:09,507 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 4800, loss[loss=0.07254, beats_loss=0.007285, ecapa_loss=0.0001716, whisper_loss=0.06354, over 13451.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01046, ecapa_loss=0.0001398, whisper_loss=0.08941, over 3808827.50 frames. ], batch size: 54, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:07:13,113 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 16 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-20 20:07:16,463 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 19 from LS+wenet, 11 from Vox, 38 fro AS 2024-08-20 20:07:39,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4937790.0, ans=0.1 2024-08-20 20:07:49,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4937890.0, ans=0.95 2024-08-20 20:08:01,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4937990.0, ans=0.0 2024-08-20 20:08:02,679 WARNING [optim.py:496] (0/4) Scaling gradients by 0.025113865733146667, model_norm_threshold=51.21064758300781 2024-08-20 20:08:02,836 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.29, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.205e+06, grad_sumsq=1.205e+06, orig_rms_sq=1.000e+00 2024-08-20 20:08:05,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=4937990.0, ans=6.0 2024-08-20 20:08:07,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4937990.0, ans=0.0 2024-08-20 20:08:08,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4937990.0, ans=0.125 2024-08-20 20:08:25,711 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 32 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-20 20:08:37,929 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 4850, loss[loss=0.116, beats_loss=0.01015, ecapa_loss=0.0001364, whisper_loss=0.1044, over 22627.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01038, ecapa_loss=0.0001393, whisper_loss=0.08957, over 3819044.94 frames. ], batch size: 89, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:08:40,717 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2024-08-20 20:08:44,877 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2024-08-20 20:08:59,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4938290.0, ans=0.125 2024-08-20 20:09:16,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4938390.0, ans=0.125 2024-08-20 20:09:34,538 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.379e+01 2.535e+01 2.834e+01 2.039e+03, threshold=5.069e+01, percent-clipped=1.0 2024-08-20 20:09:38,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4938490.0, ans=0.125 2024-08-20 20:09:47,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4938590.0, ans=0.125 2024-08-20 20:10:05,582 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 4900, loss[loss=0.08002, beats_loss=0.01042, ecapa_loss=0.0001494, whisper_loss=0.06811, over 17561.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0104, ecapa_loss=0.0001405, whisper_loss=0.08906, over 3800752.16 frames. ], batch size: 73, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:10:07,451 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-20 20:10:22,082 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 31 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-20 20:10:56,066 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-08-20 20:11:15,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4939090.0, ans=0.2 2024-08-20 20:11:17,230 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 21 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 20:11:29,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4939090.0, ans=0.2 2024-08-20 20:11:34,799 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 4950, loss[loss=0.0976, beats_loss=0.01275, ecapa_loss=0.000127, whisper_loss=0.08358, over 23550.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0104, ecapa_loss=0.0001411, whisper_loss=0.0895, over 3814067.07 frames. ], batch size: 94, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:11:44,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4939190.0, ans=0.2 2024-08-20 20:11:59,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4939290.0, ans=0.0 2024-08-20 20:12:20,437 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 34 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-20 20:12:28,277 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 25 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-20 20:12:31,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4939490.0, ans=10.0 2024-08-20 20:12:32,889 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+01 2.356e+01 2.576e+01 2.948e+01 1.126e+02, threshold=5.153e+01, percent-clipped=1.0 2024-08-20 20:12:36,556 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 27 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-20 20:12:36,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4939490.0, ans=0.125 2024-08-20 20:12:54,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4939590.0, ans=0.1 2024-08-20 20:13:03,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4939690.0, ans=0.0 2024-08-20 20:13:05,119 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 5000, loss[loss=0.1065, beats_loss=0.01078, ecapa_loss=0.0001437, whisper_loss=0.09427, over 22834.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01049, ecapa_loss=0.0001403, whisper_loss=0.08887, over 3824019.26 frames. ], batch size: 92, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:13:36,727 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 30 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-20 20:14:14,427 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 26 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-20 20:14:23,319 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 17 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-20 20:14:28,109 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.71 vs. limit=15.0 2024-08-20 20:14:36,054 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 5050, loss[loss=0.08357, beats_loss=0.01346, ecapa_loss=0.0001251, whisper_loss=0.06885, over 16356.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.0105, ecapa_loss=0.0001398, whisper_loss=0.08883, over 3809847.16 frames. ], batch size: 64, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:14:45,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4940190.0, ans=0.0 2024-08-20 20:15:12,295 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.55 vs. limit=12.0 2024-08-20 20:15:33,062 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.260e+01 2.507e+01 2.805e+01 5.478e+01, threshold=5.013e+01, percent-clipped=1.0 2024-08-20 20:15:37,043 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 23 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-20 20:15:45,434 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 32 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-20 20:15:56,082 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-20 20:16:04,929 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 5100, loss[loss=0.09724, beats_loss=0.009149, ecapa_loss=0.0001028, whisper_loss=0.08707, over 15042.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01052, ecapa_loss=0.0001398, whisper_loss=0.08898, over 3807295.81 frames. ], batch size: 51, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:16:49,064 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 10 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 20:17:30,554 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 31 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 20:17:32,265 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 5150, loss[loss=0.1167, beats_loss=0.01002, ecapa_loss=0.0001375, whisper_loss=0.1053, over 22481.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01055, ecapa_loss=0.0001406, whisper_loss=0.08912, over 3789059.58 frames. ], batch size: 86, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:18:14,738 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.63 vs. limit=15.0 2024-08-20 20:18:21,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4941390.0, ans=0.125 2024-08-20 20:18:27,206 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.219e+01 2.541e+01 2.868e+01 3.859e+01, threshold=5.083e+01, percent-clipped=0.0 2024-08-20 20:18:29,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4941490.0, ans=0.125 2024-08-20 20:18:57,906 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 5200, loss[loss=0.1112, beats_loss=0.008903, ecapa_loss=0.0001672, whisper_loss=0.1006, over 15763.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001395, whisper_loss=0.08963, over 3793695.59 frames. ], batch size: 63, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:19:09,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4941690.0, ans=0.125 2024-08-20 20:19:13,963 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 32 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 20:19:15,698 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 16 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-20 20:19:44,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4941890.0, ans=0.125 2024-08-20 20:20:21,182 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 23 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-20 20:20:24,766 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 27 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-20 20:20:26,034 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 5250, loss[loss=0.09982, beats_loss=0.01235, ecapa_loss=0.0001229, whisper_loss=0.08624, over 22096.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01046, ecapa_loss=0.0001396, whisper_loss=0.09048, over 3803032.17 frames. ], batch size: 91, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:20:32,810 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 17 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-20 20:20:36,208 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 19 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-20 20:20:47,397 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 12 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-20 20:20:56,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4942290.0, ans=0.035 2024-08-20 20:21:06,647 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-20 20:21:10,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4942390.0, ans=0.1 2024-08-20 20:21:19,427 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.09 vs. limit=15.0 2024-08-20 20:21:22,408 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.805e+01 2.241e+01 2.519e+01 2.751e+01 3.972e+01, threshold=5.038e+01, percent-clipped=0.0 2024-08-20 20:21:52,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4942690.0, ans=0.035 2024-08-20 20:21:53,481 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 5300, loss[loss=0.09937, beats_loss=0.009607, ecapa_loss=0.0001237, whisper_loss=0.08852, over 18311.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01047, ecapa_loss=0.0001382, whisper_loss=0.09058, over 3762839.44 frames. ], batch size: 67, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:22:06,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4942690.0, ans=0.125 2024-08-20 20:22:06,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4942690.0, ans=0.0 2024-08-20 20:22:17,956 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 16 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 20:22:30,354 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.34 vs. limit=22.5 2024-08-20 20:23:05,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4943090.0, ans=0.1 2024-08-20 20:23:07,092 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.42 vs. limit=15.0 2024-08-20 20:23:16,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4943090.0, ans=0.1 2024-08-20 20:23:20,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4943190.0, ans=0.07 2024-08-20 20:23:21,360 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.78 vs. limit=15.0 2024-08-20 20:23:22,220 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 5350, loss[loss=0.1064, beats_loss=0.009962, ecapa_loss=0.000125, whisper_loss=0.09516, over 22979.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.0001385, whisper_loss=0.09045, over 3770503.57 frames. ], batch size: 89, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:23:39,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4943290.0, ans=0.125 2024-08-20 20:23:42,339 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-20 20:23:48,996 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 35 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-20 20:23:53,968 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-20 20:24:03,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4943390.0, ans=0.1 2024-08-20 20:24:15,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4943490.0, ans=0.025 2024-08-20 20:24:19,654 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.338e+01 2.503e+01 2.804e+01 4.042e+01, threshold=5.006e+01, percent-clipped=0.0 2024-08-20 20:24:37,350 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 29 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-20 20:24:39,017 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 25 from LS+wenet, 8 from Vox, 25 fro AS 2024-08-20 20:24:51,258 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 5400, loss[loss=0.1183, beats_loss=0.007128, ecapa_loss=0.0001555, whisper_loss=0.1096, over 17328.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01018, ecapa_loss=0.0001402, whisper_loss=0.09176, over 3772471.13 frames. ], batch size: 66, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:24:59,250 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.77 vs. limit=15.0 2024-08-20 20:25:04,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4943690.0, ans=0.125 2024-08-20 20:25:08,714 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 14 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 20:25:14,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=4943790.0, ans=15.0 2024-08-20 20:25:21,528 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-20 20:25:23,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4943790.0, ans=0.2 2024-08-20 20:25:29,349 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 23 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-20 20:26:17,977 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 5450, loss[loss=0.1016, beats_loss=0.01049, ecapa_loss=0.0001483, whisper_loss=0.08964, over 22276.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01031, ecapa_loss=0.0001396, whisper_loss=0.09054, over 3794545.36 frames. ], batch size: 93, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:26:31,089 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-20 20:26:58,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4944390.0, ans=0.125 2024-08-20 20:27:09,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4944390.0, ans=0.0 2024-08-20 20:27:17,582 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.208e+01 2.419e+01 2.750e+01 4.613e+01, threshold=4.839e+01, percent-clipped=0.0 2024-08-20 20:27:26,777 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-20 20:27:37,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4944590.0, ans=0.125 2024-08-20 20:27:48,474 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 5500, loss[loss=0.08922, beats_loss=0.01144, ecapa_loss=0.0001528, whisper_loss=0.07625, over 18412.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01032, ecapa_loss=0.0001393, whisper_loss=0.09047, over 3786192.73 frames. ], batch size: 77, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:27:57,879 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 39 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-20 20:28:19,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4944790.0, ans=0.2 2024-08-20 20:28:24,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4944890.0, ans=0.125 2024-08-20 20:28:39,337 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 24 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 20:28:42,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4944990.0, ans=0.1 2024-08-20 20:28:46,468 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-08-20 20:28:47,616 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 23 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-20 20:29:16,160 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 5550, loss[loss=0.1192, beats_loss=0.006562, ecapa_loss=0.0001419, whisper_loss=0.1112, over 15128.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01028, ecapa_loss=0.0001394, whisper_loss=0.09098, over 3795509.73 frames. ], batch size: 52, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:29:32,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4945290.0, ans=0.1 2024-08-20 20:29:38,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4945290.0, ans=0.0 2024-08-20 20:29:41,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4945290.0, ans=0.125 2024-08-20 20:29:49,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4945290.0, ans=0.125 2024-08-20 20:30:11,577 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 29 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-20 20:30:12,483 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.276e+01 2.520e+01 2.741e+01 3.796e+01, threshold=5.039e+01, percent-clipped=0.0 2024-08-20 20:30:36,160 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2024-08-20 20:30:44,036 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 5600, loss[loss=0.1055, beats_loss=0.007779, ecapa_loss=0.000158, whisper_loss=0.09618, over 19504.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01026, ecapa_loss=0.0001407, whisper_loss=0.09112, over 3802022.58 frames. ], batch size: 82, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:30:48,296 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 26 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-20 20:30:55,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4945690.0, ans=0.1 2024-08-20 20:31:31,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4945890.0, ans=0.125 2024-08-20 20:31:33,257 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-20 20:31:40,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=4945990.0, ans=10.0 2024-08-20 20:31:48,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4945990.0, ans=0.1 2024-08-20 20:31:59,074 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 16 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-20 20:31:59,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4946090.0, ans=0.07 2024-08-20 20:32:12,739 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 5650, loss[loss=0.1019, beats_loss=0.01236, ecapa_loss=0.000128, whisper_loss=0.08829, over 23109.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01028, ecapa_loss=0.0001411, whisper_loss=0.09061, over 3828326.89 frames. ], batch size: 92, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:32:20,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4946190.0, ans=0.04949747468305833 2024-08-20 20:32:31,878 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 21 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-20 20:32:34,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4946290.0, ans=0.2 2024-08-20 20:32:44,463 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.90 vs. limit=15.0 2024-08-20 20:32:52,009 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 22 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-20 20:32:58,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4946390.0, ans=0.125 2024-08-20 20:33:09,725 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.323e+01 2.509e+01 2.836e+01 4.746e+01, threshold=5.018e+01, percent-clipped=0.0 2024-08-20 20:33:14,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4946490.0, ans=0.0 2024-08-20 20:33:16,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4946490.0, ans=0.125 2024-08-20 20:33:43,477 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 5700, loss[loss=0.09144, beats_loss=0.0127, ecapa_loss=0.0001415, whisper_loss=0.07732, over 20240.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0103, ecapa_loss=0.0001415, whisper_loss=0.09065, over 3844634.66 frames. ], batch size: 81, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:34:05,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4946790.0, ans=0.1 2024-08-20 20:34:22,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4946890.0, ans=0.0 2024-08-20 20:34:24,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4946890.0, ans=0.125 2024-08-20 20:34:30,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4946890.0, ans=0.0 2024-08-20 20:34:33,871 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 17 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 20:34:43,874 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.808e-03 2024-08-20 20:34:49,043 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 11 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 20:34:50,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4946990.0, ans=0.0 2024-08-20 20:34:54,285 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.411e+01 2024-08-20 20:35:08,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4947090.0, ans=0.1 2024-08-20 20:35:19,980 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 5750, loss[loss=0.08214, beats_loss=0.01093, ecapa_loss=0.0001417, whisper_loss=0.0698, over 20094.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01035, ecapa_loss=0.0001411, whisper_loss=0.0904, over 3865306.54 frames. ], batch size: 86, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:35:38,176 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-20 20:35:46,618 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 24 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-20 20:36:14,301 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 30 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-20 20:36:23,822 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.253e+01 2.565e+01 2.811e+01 3.552e+01, threshold=5.130e+01, percent-clipped=0.0 2024-08-20 20:36:31,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4947490.0, ans=0.125 2024-08-20 20:36:34,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4947490.0, ans=0.125 2024-08-20 20:36:39,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4947590.0, ans=0.0 2024-08-20 20:36:57,907 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 5800, loss[loss=0.09805, beats_loss=0.00927, ecapa_loss=0.0001286, whisper_loss=0.08749, over 15290.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01033, ecapa_loss=0.0001422, whisper_loss=0.09045, over 3855375.00 frames. ], batch size: 60, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:36:58,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4947690.0, ans=0.0 2024-08-20 20:36:58,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4947690.0, ans=0.0 2024-08-20 20:37:08,892 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-08-20 20:37:19,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4947790.0, ans=0.125 2024-08-20 20:37:30,915 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 20:37:47,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4947890.0, ans=0.125 2024-08-20 20:37:51,237 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 25 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-20 20:38:06,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4947990.0, ans=0.125 2024-08-20 20:38:18,122 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 19 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 20:38:30,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4948090.0, ans=0.125 2024-08-20 20:38:36,425 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 5850, loss[loss=0.08773, beats_loss=0.01204, ecapa_loss=0.0001575, whisper_loss=0.07412, over 20417.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01033, ecapa_loss=0.0001413, whisper_loss=0.09082, over 3855979.74 frames. ], batch size: 89, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:38:42,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4948190.0, ans=0.125 2024-08-20 20:38:59,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4948290.0, ans=0.0 2024-08-20 20:39:14,446 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 18 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 20:39:18,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4948390.0, ans=0.07 2024-08-20 20:39:19,429 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.75 vs. limit=10.0 2024-08-20 20:39:25,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4948390.0, ans=0.125 2024-08-20 20:39:25,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4948390.0, ans=0.04949747468305833 2024-08-20 20:39:33,622 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.744e+01 2.256e+01 2.476e+01 2.710e+01 3.923e+01, threshold=4.952e+01, percent-clipped=0.0 2024-08-20 20:39:40,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4948490.0, ans=0.125 2024-08-20 20:39:41,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4948490.0, ans=0.0 2024-08-20 20:39:46,213 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 16 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-20 20:39:50,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=4948590.0, ans=22.5 2024-08-20 20:39:50,525 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.42 vs. limit=22.5 2024-08-20 20:39:51,049 WARNING [optim.py:496] (0/4) Scaling gradients by 0.021723005920648575, model_norm_threshold=49.52134323120117 2024-08-20 20:39:51,221 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.25, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.306e+06, grad_sumsq=3.974e+05, orig_rms_sq=3.286e+00 2024-08-20 20:39:51,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4948590.0, ans=0.125 2024-08-20 20:39:53,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4948590.0, ans=0.125 2024-08-20 20:40:06,631 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 5900, loss[loss=0.1138, beats_loss=0.01189, ecapa_loss=0.000136, whisper_loss=0.1005, over 18421.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01034, ecapa_loss=0.0001408, whisper_loss=0.09057, over 3821089.48 frames. ], batch size: 73, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:40:16,419 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 18 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-20 20:40:21,842 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-20 20:40:21,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4948690.0, ans=0.125 2024-08-20 20:40:24,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4948790.0, ans=0.0 2024-08-20 20:40:25,527 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-20 20:40:43,785 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2024-08-20 20:41:08,938 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.39 vs. limit=22.5 2024-08-20 20:41:19,312 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 22 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 20:41:25,082 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.14 vs. limit=15.0 2024-08-20 20:41:33,948 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-08-20 20:41:36,089 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 5950, loss[loss=0.1131, beats_loss=0.0111, ecapa_loss=0.0001121, whisper_loss=0.1009, over 22715.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01037, ecapa_loss=0.000141, whisper_loss=0.09014, over 3823783.52 frames. ], batch size: 88, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:41:37,069 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 18 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-20 20:41:43,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4949190.0, ans=0.125 2024-08-20 20:42:33,655 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.390e+01 2.635e+01 2.817e+01 2.280e+03, threshold=5.271e+01, percent-clipped=1.0 2024-08-20 20:42:44,553 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.89 vs. limit=22.5 2024-08-20 20:43:03,497 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=22.5 2024-08-20 20:43:05,625 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=15.0 2024-08-20 20:43:06,091 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 6000, loss[loss=0.1038, beats_loss=0.00802, ecapa_loss=0.0001884, whisper_loss=0.09391, over 13474.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01036, ecapa_loss=0.0001408, whisper_loss=0.09016, over 3808889.64 frames. ], batch size: 51, lr: 1.80e-03, grad_scale: 1.152921504606847e+18 2024-08-20 20:43:06,092 INFO [train_multi_KD3.py:1140] (0/4) Computing validation loss 2024-08-20 20:43:57,970 INFO [train_multi_KD3.py:1150] (0/4) Epoch 34, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.0005083, whisper_loss=0.249, over 931116.00 frames. 2024-08-20 20:44:22,270 INFO [train_multi_KD3.py:1150] (0/4) Epoch 34, validation on SV_voxceleb1: loss=0.003999, beats_loss=0, ecapa_loss=0.0003999, whisper_loss=0, over 944235.00 frames. 2024-08-20 20:45:57,816 INFO [train_multi_KD3.py:1150] (0/4) Epoch 34, validation on AT_audioset: loss=0.02294, beats_loss=0.02294, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 20:45:57,820 INFO [train_multi_KD3.py:1156] (0/4) Maximum memory allocated so far is 32775MB 2024-08-20 20:46:26,696 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 18 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-20 20:46:39,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4949890.0, ans=0.2 2024-08-20 20:46:55,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4949990.0, ans=0.0 2024-08-20 20:47:29,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4950090.0, ans=0.0 2024-08-20 20:47:36,898 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 6050, loss[loss=0.0774, beats_loss=0.01052, ecapa_loss=0.0001288, whisper_loss=0.06559, over 12875.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01034, ecapa_loss=0.0001397, whisper_loss=0.08976, over 3785813.80 frames. ], batch size: 51, lr: 1.80e-03, grad_scale: 1.152921504606847e+18 2024-08-20 20:47:46,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4950190.0, ans=0.1 2024-08-20 20:48:41,172 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 26 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-20 20:48:53,430 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.341e+01 2.597e+01 2.876e+01 5.831e+01, threshold=5.193e+01, percent-clipped=1.0 2024-08-20 20:48:55,217 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=15.0 2024-08-20 20:49:18,320 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 18 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-20 20:49:27,822 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 6100, loss[loss=0.1136, beats_loss=0.007931, ecapa_loss=0.000127, whisper_loss=0.1044, over 17893.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01039, ecapa_loss=0.0001392, whisper_loss=0.08941, over 3792971.37 frames. ], batch size: 66, lr: 1.80e-03, grad_scale: 1.152921504606847e+18 2024-08-20 20:49:29,836 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2024-08-20 20:49:33,640 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 20 from LS+wenet, 32 from Vox, 35 fro AS 2024-08-20 20:49:35,646 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 33 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 20:49:42,530 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-20 20:49:42,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4950690.0, ans=0.2 2024-08-20 20:49:44,946 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 27 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-20 20:49:58,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4950790.0, ans=0.07 2024-08-20 20:49:58,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4950790.0, ans=0.125 2024-08-20 20:50:01,413 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.12 vs. limit=15.0 2024-08-20 20:50:03,139 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 26 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 20:50:41,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4950990.0, ans=0.09899494936611666 2024-08-20 20:50:50,194 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 27 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-20 20:50:54,426 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 23 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-20 20:51:00,878 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.25 vs. limit=22.5 2024-08-20 20:51:05,360 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 39 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-20 20:51:17,233 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 6150, loss[loss=0.1016, beats_loss=0.009194, ecapa_loss=9.829e-05, whisper_loss=0.09145, over 13875.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0104, ecapa_loss=0.0001379, whisper_loss=0.08965, over 3767941.70 frames. ], batch size: 51, lr: 1.80e-03, grad_scale: 1.152921504606847e+18 2024-08-20 20:51:18,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4951190.0, ans=0.1 2024-08-20 20:51:53,289 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 20 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-20 20:52:00,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4951390.0, ans=0.0 2024-08-20 20:52:22,798 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=15.0 2024-08-20 20:52:27,830 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.295e+01 2.472e+01 2.689e+01 4.282e+01, threshold=4.944e+01, percent-clipped=0.0 2024-08-20 20:52:56,810 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.74 vs. limit=15.0 2024-08-20 20:53:01,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4951590.0, ans=0.125 2024-08-20 20:53:03,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4951590.0, ans=0.125 2024-08-20 20:53:06,737 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 6200, loss[loss=0.1057, beats_loss=0.00931, ecapa_loss=0.0001448, whisper_loss=0.09492, over 21526.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01045, ecapa_loss=0.0001381, whisper_loss=0.08943, over 3776700.28 frames. ], batch size: 89, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:53:10,392 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.74 vs. limit=15.0 2024-08-20 20:53:18,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4951690.0, ans=0.5 2024-08-20 20:53:54,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4951890.0, ans=0.125 2024-08-20 20:53:55,932 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-20 20:54:56,591 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 6250, loss[loss=0.1029, beats_loss=0.01143, ecapa_loss=0.0001501, whisper_loss=0.08998, over 22427.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01057, ecapa_loss=0.0001384, whisper_loss=0.08861, over 3795414.55 frames. ], batch size: 91, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:54:59,219 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 17 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-20 20:55:02,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4952190.0, ans=0.125 2024-08-20 20:55:19,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4952290.0, ans=0.95 2024-08-20 20:55:36,228 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=22.5 2024-08-20 20:56:06,130 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.296e+01 2.528e+01 2.851e+01 2.776e+02, threshold=5.056e+01, percent-clipped=4.0 2024-08-20 20:56:44,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4952590.0, ans=0.2 2024-08-20 20:56:47,426 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 6300, loss[loss=0.1084, beats_loss=0.01001, ecapa_loss=0.0001354, whisper_loss=0.09706, over 16056.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01057, ecapa_loss=0.0001395, whisper_loss=0.08852, over 3808591.96 frames. ], batch size: 64, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:56:57,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4952690.0, ans=0.125 2024-08-20 20:57:29,737 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 26 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-20 20:57:34,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4952790.0, ans=0.125 2024-08-20 20:57:56,583 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 23 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-20 20:58:43,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4953190.0, ans=0.125 2024-08-20 20:58:44,208 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 6350, loss[loss=0.09596, beats_loss=0.01034, ecapa_loss=0.0001531, whisper_loss=0.0841, over 19559.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01049, ecapa_loss=0.0001399, whisper_loss=0.08874, over 3819246.74 frames. ], batch size: 78, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 20:58:45,258 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 21 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-20 20:58:57,010 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.08 vs. limit=8.0 2024-08-20 20:59:12,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=4953290.0, ans=0.5 2024-08-20 20:59:20,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4953290.0, ans=0.2 2024-08-20 20:59:52,785 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.372e+01 2.595e+01 2.941e+01 1.196e+02, threshold=5.191e+01, percent-clipped=6.0 2024-08-20 20:59:58,156 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.20 vs. limit=15.0 2024-08-20 21:00:12,493 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 30 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 21:00:12,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4953590.0, ans=0.0 2024-08-20 21:00:13,044 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.33 vs. limit=10.0 2024-08-20 21:00:26,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4953590.0, ans=0.125 2024-08-20 21:00:27,043 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.977e-01 2024-08-20 21:00:28,417 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 18 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 21:00:29,446 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 6400, loss[loss=0.08644, beats_loss=0.01073, ecapa_loss=0.000133, whisper_loss=0.07438, over 16685.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01055, ecapa_loss=0.0001396, whisper_loss=0.08834, over 3810703.28 frames. ], batch size: 64, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:00:34,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4953690.0, ans=0.125 2024-08-20 21:00:55,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4953790.0, ans=0.125 2024-08-20 21:01:02,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4953790.0, ans=0.1 2024-08-20 21:01:08,684 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2024-08-20 21:01:39,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4953990.0, ans=0.125 2024-08-20 21:01:50,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4954090.0, ans=0.0 2024-08-20 21:02:08,847 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 6450, loss[loss=0.09793, beats_loss=0.01153, ecapa_loss=0.0001124, whisper_loss=0.08528, over 23217.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0105, ecapa_loss=0.0001382, whisper_loss=0.0891, over 3807575.76 frames. ], batch size: 92, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:02:14,428 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.58 vs. limit=15.0 2024-08-20 21:02:27,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4954290.0, ans=0.125 2024-08-20 21:02:59,300 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2024-08-20 21:03:08,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4954490.0, ans=0.125 2024-08-20 21:03:11,010 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.299e+01 2.553e+01 2.896e+01 1.351e+02, threshold=5.106e+01, percent-clipped=1.0 2024-08-20 21:03:14,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4954490.0, ans=0.125 2024-08-20 21:03:33,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.59 vs. limit=15.0 2024-08-20 21:03:38,152 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 25 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-20 21:03:44,647 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 6500, loss[loss=0.1159, beats_loss=0.007917, ecapa_loss=0.0001201, whisper_loss=0.1068, over 14494.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01052, ecapa_loss=0.0001379, whisper_loss=0.089, over 3788739.46 frames. ], batch size: 51, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:04:25,423 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 20 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-20 21:04:41,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4954890.0, ans=10.0 2024-08-20 21:04:43,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4954890.0, ans=0.1 2024-08-20 21:04:53,444 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 21 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-20 21:05:02,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4954990.0, ans=0.0 2024-08-20 21:05:18,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4955090.0, ans=0.0 2024-08-20 21:05:18,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4955090.0, ans=0.0 2024-08-20 21:05:23,446 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 33 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 21:05:28,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=4955090.0, ans=10.0 2024-08-20 21:05:40,971 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 6550, loss[loss=0.1028, beats_loss=0.009666, ecapa_loss=0.0001417, whisper_loss=0.09171, over 19105.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01049, ecapa_loss=0.0001377, whisper_loss=0.08953, over 3787342.60 frames. ], batch size: 77, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:05:58,125 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-20 21:06:54,704 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.69 vs. limit=15.0 2024-08-20 21:06:56,107 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 20 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-20 21:06:57,274 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.272e+01 2.491e+01 2.852e+01 4.089e+01, threshold=4.982e+01, percent-clipped=0.0 2024-08-20 21:07:03,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4955490.0, ans=0.1 2024-08-20 21:07:17,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4955590.0, ans=0.125 2024-08-20 21:07:26,684 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 16 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-20 21:07:37,682 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 6600, loss[loss=0.1041, beats_loss=0.01072, ecapa_loss=0.0001359, whisper_loss=0.09198, over 18542.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01038, ecapa_loss=0.0001389, whisper_loss=0.09038, over 3817386.94 frames. ], batch size: 72, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:07:55,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4955690.0, ans=0.5 2024-08-20 21:08:35,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4955890.0, ans=0.125 2024-08-20 21:08:43,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4955890.0, ans=0.2 2024-08-20 21:09:24,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4955990.0, ans=0.125 2024-08-20 21:09:43,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4956090.0, ans=0.1 2024-08-20 21:09:47,609 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 6650, loss[loss=0.09633, beats_loss=0.01237, ecapa_loss=0.0001326, whisper_loss=0.08263, over 20224.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01034, ecapa_loss=0.0001388, whisper_loss=0.09103, over 3842040.90 frames. ], batch size: 81, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:10:30,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4956390.0, ans=0.125 2024-08-20 21:10:47,468 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.323e+01 2.536e+01 2.903e+01 4.430e+01, threshold=5.073e+01, percent-clipped=0.0 2024-08-20 21:11:14,005 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 26 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-20 21:11:18,825 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 6700, loss[loss=0.09651, beats_loss=0.01253, ecapa_loss=0.0001296, whisper_loss=0.08269, over 20771.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01029, ecapa_loss=0.0001405, whisper_loss=0.09163, over 3857196.91 frames. ], batch size: 84, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:11:21,161 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-20 21:11:45,973 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 19 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 21:11:52,719 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 29 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 21:12:01,442 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 26 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-20 21:12:13,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4956990.0, ans=0.04949747468305833 2024-08-20 21:12:20,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4956990.0, ans=0.0 2024-08-20 21:12:36,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4957090.0, ans=0.125 2024-08-20 21:12:41,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4957090.0, ans=0.0 2024-08-20 21:12:45,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4957190.0, ans=0.125 2024-08-20 21:12:46,390 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 6750, loss[loss=0.1045, beats_loss=0.009183, ecapa_loss=0.0001164, whisper_loss=0.09413, over 17666.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01031, ecapa_loss=0.0001405, whisper_loss=0.09107, over 3851755.14 frames. ], batch size: 67, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:12:50,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4957190.0, ans=0.1 2024-08-20 21:12:53,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4957190.0, ans=0.05 2024-08-20 21:12:57,405 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 28 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-20 21:12:57,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4957190.0, ans=0.0 2024-08-20 21:13:04,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4957290.0, ans=0.0 2024-08-20 21:13:09,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4957290.0, ans=0.125 2024-08-20 21:13:13,251 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.66 vs. limit=12.0 2024-08-20 21:13:42,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4957490.0, ans=0.015 2024-08-20 21:13:44,076 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.394e+01 2.668e+01 3.101e+01 4.157e+01, threshold=5.336e+01, percent-clipped=0.0 2024-08-20 21:14:12,994 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 6800, loss[loss=0.1091, beats_loss=0.00917, ecapa_loss=0.0001525, whisper_loss=0.09837, over 22198.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0103, ecapa_loss=0.0001404, whisper_loss=0.09094, over 3858308.90 frames. ], batch size: 92, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:14:15,451 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.66 vs. limit=15.0 2024-08-20 21:14:16,470 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 14 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-20 21:14:24,823 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 26 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-20 21:14:45,708 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-20 21:15:08,373 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 14 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-20 21:15:12,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4957990.0, ans=0.125 2024-08-20 21:15:14,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4957990.0, ans=0.2 2024-08-20 21:15:18,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4957990.0, ans=0.125 2024-08-20 21:15:28,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4958090.0, ans=0.125 2024-08-20 21:15:34,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4958090.0, ans=0.125 2024-08-20 21:15:39,289 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 6850, loss[loss=0.0946, beats_loss=0.01118, ecapa_loss=0.0001649, whisper_loss=0.08177, over 22014.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01028, ecapa_loss=0.00014, whisper_loss=0.09161, over 3856517.47 frames. ], batch size: 94, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:15:39,740 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 35 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 21:15:47,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4958190.0, ans=0.0 2024-08-20 21:16:14,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4958390.0, ans=0.125 2024-08-20 21:16:15,159 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.74 vs. limit=15.0 2024-08-20 21:16:32,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4958490.0, ans=0.1 2024-08-20 21:16:36,570 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.698e+01 2.275e+01 2.461e+01 2.676e+01 7.935e+01, threshold=4.923e+01, percent-clipped=1.0 2024-08-20 21:16:57,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4958590.0, ans=0.125 2024-08-20 21:17:00,858 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 29 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-20 21:17:06,146 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 6900, loss[loss=0.1046, beats_loss=0.01022, ecapa_loss=0.0001352, whisper_loss=0.09306, over 23496.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01034, ecapa_loss=0.0001397, whisper_loss=0.09155, over 3883410.90 frames. ], batch size: 93, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:17:12,891 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 19 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-20 21:17:28,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4958790.0, ans=0.2 2024-08-20 21:17:55,196 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 28 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-20 21:18:06,660 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 28 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-20 21:18:15,530 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 21 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 21:18:23,846 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 28 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 21:18:29,360 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 26 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-20 21:18:31,991 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 6950, loss[loss=0.08391, beats_loss=0.00856, ecapa_loss=0.0001411, whisper_loss=0.07394, over 15574.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01034, ecapa_loss=0.0001409, whisper_loss=0.09101, over 3867264.91 frames. ], batch size: 59, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:18:32,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4959190.0, ans=0.125 2024-08-20 21:18:33,909 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 16 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-20 21:18:53,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4959290.0, ans=0.125 2024-08-20 21:19:07,628 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-20 21:19:11,496 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 26 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-20 21:19:13,588 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 17 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-20 21:19:16,966 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2024-08-20 21:19:29,587 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.291e+01 2.502e+01 2.810e+01 1.652e+02, threshold=5.004e+01, percent-clipped=1.0 2024-08-20 21:19:35,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4959490.0, ans=0.2 2024-08-20 21:19:58,701 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 7000, loss[loss=0.08998, beats_loss=0.009126, ecapa_loss=0.0001718, whisper_loss=0.07914, over 12831.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01045, ecapa_loss=0.0001404, whisper_loss=0.08989, over 3863715.47 frames. ], batch size: 52, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:20:24,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4959790.0, ans=0.125 2024-08-20 21:20:24,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4959790.0, ans=0.125 2024-08-20 21:20:52,019 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-496000.pt 2024-08-20 21:21:24,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4960090.0, ans=0.1 2024-08-20 21:21:29,270 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 7050, loss[loss=0.1005, beats_loss=0.0119, ecapa_loss=0.0001158, whisper_loss=0.08745, over 21964.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01048, ecapa_loss=0.0001398, whisper_loss=0.08963, over 3882012.20 frames. ], batch size: 88, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:21:35,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4960190.0, ans=0.125 2024-08-20 21:21:40,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4960190.0, ans=0.125 2024-08-20 21:21:59,588 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.81 vs. limit=12.0 2024-08-20 21:22:05,413 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 26 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-20 21:22:08,832 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 21:22:24,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4960490.0, ans=0.0 2024-08-20 21:22:25,553 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.286e+01 2.529e+01 2.848e+01 4.260e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-20 21:22:44,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4960590.0, ans=0.0 2024-08-20 21:22:55,538 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 7100, loss[loss=0.1064, beats_loss=0.01165, ecapa_loss=0.0001388, whisper_loss=0.09333, over 21906.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01044, ecapa_loss=0.0001401, whisper_loss=0.09081, over 3863828.94 frames. ], batch size: 90, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:22:56,310 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 25 from LS+wenet, 12 from Vox, 42 fro AS 2024-08-20 21:23:06,752 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.94 vs. limit=12.0 2024-08-20 21:23:13,128 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.77 vs. limit=15.0 2024-08-20 21:23:16,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4960790.0, ans=0.1 2024-08-20 21:23:53,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4960990.0, ans=0.0 2024-08-20 21:23:57,795 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.88 vs. limit=22.5 2024-08-20 21:24:10,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4961090.0, ans=0.07 2024-08-20 21:24:13,018 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2024-08-20 21:24:21,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4961090.0, ans=0.0 2024-08-20 21:24:24,146 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 7150, loss[loss=0.1125, beats_loss=0.009866, ecapa_loss=0.000138, whisper_loss=0.1012, over 18337.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0105, ecapa_loss=0.0001397, whisper_loss=0.09053, over 3824262.64 frames. ], batch size: 73, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:24:25,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4961190.0, ans=0.0 2024-08-20 21:24:25,965 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.49 vs. limit=15.0 2024-08-20 21:24:56,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4961290.0, ans=0.125 2024-08-20 21:25:19,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4961490.0, ans=0.0 2024-08-20 21:25:19,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4961490.0, ans=0.1 2024-08-20 21:25:21,935 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.246e+01 2.471e+01 2.747e+01 3.291e+02, threshold=4.942e+01, percent-clipped=1.0 2024-08-20 21:25:29,728 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 21:25:33,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4961590.0, ans=0.95 2024-08-20 21:25:33,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4961590.0, ans=0.125 2024-08-20 21:25:36,738 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 25 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-20 21:25:49,618 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.59 vs. limit=15.0 2024-08-20 21:25:51,815 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 7200, loss[loss=0.08965, beats_loss=0.009563, ecapa_loss=0.0001357, whisper_loss=0.07873, over 13769.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01048, ecapa_loss=0.0001389, whisper_loss=0.0904, over 3826678.66 frames. ], batch size: 55, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:25:52,461 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 23 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 21:25:52,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4961690.0, ans=0.0 2024-08-20 21:25:52,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4961690.0, ans=0.1 2024-08-20 21:26:14,997 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.00 vs. limit=12.0 2024-08-20 21:26:24,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4961790.0, ans=0.1 2024-08-20 21:26:31,187 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 14 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-20 21:26:32,297 WARNING [optim.py:496] (0/4) Scaling gradients by 0.05625057592988014, model_norm_threshold=49.41666793823242 2024-08-20 21:26:32,453 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.330e+05, grad_sumsq=1.330e+05, orig_rms_sq=1.000e+00 2024-08-20 21:26:44,993 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 27 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-20 21:27:13,140 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 15 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-20 21:27:20,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4962190.0, ans=0.2 2024-08-20 21:27:21,187 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 7250, loss[loss=0.0991, beats_loss=0.01196, ecapa_loss=9.741e-05, whisper_loss=0.08616, over 13980.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01038, ecapa_loss=0.0001399, whisper_loss=0.09081, over 3817222.62 frames. ], batch size: 51, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:27:47,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4962290.0, ans=0.125 2024-08-20 21:27:53,403 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.46 vs. limit=15.0 2024-08-20 21:27:57,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4962390.0, ans=0.125 2024-08-20 21:27:59,569 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 28 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-20 21:28:01,722 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.27 vs. limit=15.0 2024-08-20 21:28:16,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4962490.0, ans=0.0 2024-08-20 21:28:18,698 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.295e+01 2.557e+01 2.872e+01 8.785e+02, threshold=5.114e+01, percent-clipped=5.0 2024-08-20 21:28:26,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4962490.0, ans=0.125 2024-08-20 21:28:38,434 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 25 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-20 21:28:38,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4962590.0, ans=0.2 2024-08-20 21:28:49,192 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 7300, loss[loss=0.09143, beats_loss=0.009345, ecapa_loss=0.0001317, whisper_loss=0.08076, over 16016.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01028, ecapa_loss=0.0001407, whisper_loss=0.09136, over 3831759.69 frames. ], batch size: 64, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:28:57,680 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 22 from LS+wenet, 22 from Vox, 17 fro AS 2024-08-20 21:29:01,156 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 20 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 21:29:04,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4962790.0, ans=0.2 2024-08-20 21:29:28,388 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 22 from LS+wenet, 33 from Vox, 30 fro AS 2024-08-20 21:30:15,012 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 7350, loss[loss=0.1109, beats_loss=0.01036, ecapa_loss=0.000116, whisper_loss=0.09938, over 20922.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01023, ecapa_loss=0.0001413, whisper_loss=0.09135, over 3832863.46 frames. ], batch size: 79, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:30:32,892 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 23 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-20 21:31:11,464 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.257e+01 2.510e+01 2.739e+01 2.616e+02, threshold=5.019e+01, percent-clipped=1.0 2024-08-20 21:31:17,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4963490.0, ans=0.125 2024-08-20 21:31:19,262 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 21 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-20 21:31:26,255 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 27 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-20 21:31:28,106 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 33 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-20 21:31:31,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4963590.0, ans=0.035 2024-08-20 21:31:33,057 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 24 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-20 21:31:35,167 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.25 vs. limit=10.0 2024-08-20 21:31:40,856 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 7400, loss[loss=0.09597, beats_loss=0.01136, ecapa_loss=0.000116, whisper_loss=0.08344, over 20524.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01028, ecapa_loss=0.0001408, whisper_loss=0.09057, over 3825772.61 frames. ], batch size: 79, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:31:41,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4963690.0, ans=0.035 2024-08-20 21:31:42,001 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.71 vs. limit=15.0 2024-08-20 21:31:47,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4963690.0, ans=0.125 2024-08-20 21:32:01,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4963790.0, ans=0.07 2024-08-20 21:32:07,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4963790.0, ans=0.0 2024-08-20 21:32:24,892 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 25 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-20 21:32:37,756 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-20 21:32:39,297 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 16 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-20 21:32:39,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4963990.0, ans=0.125 2024-08-20 21:32:44,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4963990.0, ans=0.125 2024-08-20 21:32:52,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=4964090.0, ans=15.0 2024-08-20 21:32:55,059 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 23 from LS+wenet, 10 from Vox, 22 fro AS 2024-08-20 21:33:09,742 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 7450, loss[loss=0.1161, beats_loss=0.01104, ecapa_loss=0.0001352, whisper_loss=0.1037, over 23363.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01029, ecapa_loss=0.0001403, whisper_loss=0.09032, over 3818271.36 frames. ], batch size: 94, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:34:00,737 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.70 vs. limit=15.0 2024-08-20 21:34:03,721 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 27 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-20 21:34:08,755 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.276e+01 2.553e+01 2.837e+01 3.852e+01, threshold=5.106e+01, percent-clipped=0.0 2024-08-20 21:34:16,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4964490.0, ans=0.125 2024-08-20 21:34:25,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4964590.0, ans=0.2 2024-08-20 21:34:36,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4964590.0, ans=0.5 2024-08-20 21:34:39,231 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 7500, loss[loss=0.08305, beats_loss=0.01151, ecapa_loss=0.0001459, whisper_loss=0.07008, over 15183.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01031, ecapa_loss=0.0001412, whisper_loss=0.09017, over 3779539.39 frames. ], batch size: 63, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:34:41,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4964690.0, ans=0.125 2024-08-20 21:34:52,124 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 26 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-20 21:35:02,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4964790.0, ans=0.0 2024-08-20 21:35:05,993 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-20 21:35:06,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4964790.0, ans=0.95 2024-08-20 21:35:20,286 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.68 vs. limit=15.0 2024-08-20 21:35:28,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4964890.0, ans=0.125 2024-08-20 21:36:05,644 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 7550, loss[loss=0.09972, beats_loss=0.01117, ecapa_loss=0.0001529, whisper_loss=0.08702, over 14067.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0104, ecapa_loss=0.0001408, whisper_loss=0.09003, over 3782447.94 frames. ], batch size: 56, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:36:11,318 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 14 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-20 21:36:18,243 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 18 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-20 21:36:25,117 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 24 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-20 21:36:35,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4965290.0, ans=0.125 2024-08-20 21:36:40,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4965390.0, ans=0.1 2024-08-20 21:36:42,203 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 21:36:50,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4965390.0, ans=0.0 2024-08-20 21:36:55,411 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 27 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-20 21:37:02,545 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.306e+01 2.507e+01 2.711e+01 6.032e+01, threshold=5.014e+01, percent-clipped=1.0 2024-08-20 21:37:06,303 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 26 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-20 21:37:11,203 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-20 21:37:13,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4965590.0, ans=0.125 2024-08-20 21:37:29,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4965590.0, ans=0.125 2024-08-20 21:37:31,659 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 7600, loss[loss=0.1091, beats_loss=0.008603, ecapa_loss=0.0001272, whisper_loss=0.09925, over 16155.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01033, ecapa_loss=0.000141, whisper_loss=0.09035, over 3764336.98 frames. ], batch size: 62, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:37:38,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4965690.0, ans=0.0 2024-08-20 21:37:45,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4965690.0, ans=0.125 2024-08-20 21:38:07,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4965890.0, ans=0.1 2024-08-20 21:38:14,732 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.00 vs. limit=15.0 2024-08-20 21:38:28,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4965990.0, ans=0.0 2024-08-20 21:38:38,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4966090.0, ans=0.0 2024-08-20 21:38:40,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4966090.0, ans=0.0 2024-08-20 21:38:56,966 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 7650, loss[loss=0.1043, beats_loss=0.008385, ecapa_loss=0.0001329, whisper_loss=0.09462, over 16427.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01038, ecapa_loss=0.0001414, whisper_loss=0.08919, over 3749368.16 frames. ], batch size: 63, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:38:58,877 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 24 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-20 21:39:04,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4966190.0, ans=0.1 2024-08-20 21:39:14,875 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 20 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-20 21:39:16,316 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 18 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-20 21:39:28,299 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-20 21:39:29,972 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 16 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-20 21:39:37,540 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 30 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 21:39:52,075 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 27 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-20 21:39:53,586 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.307e+01 2.519e+01 2.833e+01 3.884e+01, threshold=5.038e+01, percent-clipped=0.0 2024-08-20 21:39:54,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4966490.0, ans=0.125 2024-08-20 21:39:57,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4966490.0, ans=0.0 2024-08-20 21:40:21,582 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 27 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-20 21:40:23,457 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 7700, loss[loss=0.0917, beats_loss=0.01159, ecapa_loss=0.0001861, whisper_loss=0.07825, over 20459.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01039, ecapa_loss=0.0001414, whisper_loss=0.08881, over 3748026.80 frames. ], batch size: 91, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:40:48,989 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 14 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-20 21:41:03,076 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2024-08-20 21:41:07,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4966890.0, ans=0.2 2024-08-20 21:41:09,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4966890.0, ans=0.125 2024-08-20 21:41:21,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4966990.0, ans=0.0 2024-08-20 21:41:23,500 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 23 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-20 21:41:48,696 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 7750, loss[loss=0.1151, beats_loss=0.01024, ecapa_loss=0.0001546, whisper_loss=0.1033, over 18734.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01037, ecapa_loss=0.0001409, whisper_loss=0.08868, over 3775526.26 frames. ], batch size: 76, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:41:54,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4967190.0, ans=0.125 2024-08-20 21:42:05,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4967290.0, ans=0.125 2024-08-20 21:42:08,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4967290.0, ans=0.5 2024-08-20 21:42:35,891 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.79 vs. limit=15.0 2024-08-20 21:42:46,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4967490.0, ans=0.2 2024-08-20 21:42:47,269 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.264e+01 2.525e+01 2.747e+01 3.905e+01, threshold=5.051e+01, percent-clipped=0.0 2024-08-20 21:43:16,717 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 7800, loss[loss=0.1027, beats_loss=0.007748, ecapa_loss=0.0001327, whisper_loss=0.09367, over 16801.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01043, ecapa_loss=0.0001391, whisper_loss=0.08847, over 3774324.09 frames. ], batch size: 61, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:43:18,643 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 38 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-20 21:43:24,117 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 22 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-20 21:43:27,553 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 18 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 21:43:39,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4967790.0, ans=0.125 2024-08-20 21:43:50,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4967890.0, ans=0.125 2024-08-20 21:44:02,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4967890.0, ans=0.125 2024-08-20 21:44:04,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4967890.0, ans=0.0 2024-08-20 21:44:11,748 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-20 21:44:21,326 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.15 vs. limit=6.0 2024-08-20 21:44:22,192 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 27 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-20 21:44:38,832 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 26 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-20 21:44:39,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4968090.0, ans=0.125 2024-08-20 21:44:43,119 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 7850, loss[loss=0.09665, beats_loss=0.009668, ecapa_loss=0.0001397, whisper_loss=0.08558, over 23060.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01041, ecapa_loss=0.0001387, whisper_loss=0.08845, over 3776373.75 frames. ], batch size: 94, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:44:47,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4968190.0, ans=0.0 2024-08-20 21:44:49,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4968190.0, ans=0.0 2024-08-20 21:45:00,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4968290.0, ans=0.1 2024-08-20 21:45:05,787 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.24 vs. limit=10.0 2024-08-20 21:45:15,486 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 18 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 21:45:15,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4968290.0, ans=0.125 2024-08-20 21:45:31,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4968390.0, ans=0.125 2024-08-20 21:45:41,492 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.317e+01 2.497e+01 2.913e+01 5.826e+01, threshold=4.993e+01, percent-clipped=1.0 2024-08-20 21:46:11,156 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 7900, loss[loss=0.08568, beats_loss=0.01119, ecapa_loss=0.0001326, whisper_loss=0.07317, over 17680.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01042, ecapa_loss=0.0001374, whisper_loss=0.08902, over 3796800.89 frames. ], batch size: 68, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:46:34,192 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-20 21:47:01,884 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 21 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-20 21:47:20,525 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.61 vs. limit=15.0 2024-08-20 21:47:38,943 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 7950, loss[loss=0.1115, beats_loss=0.009107, ecapa_loss=0.0001265, whisper_loss=0.1011, over 14856.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01048, ecapa_loss=0.0001374, whisper_loss=0.08917, over 3830955.00 frames. ], batch size: 56, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:47:48,231 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 43 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-20 21:47:55,490 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 28 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-20 21:48:09,159 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 13 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-20 21:48:19,803 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 27 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-20 21:48:27,344 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 25 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-20 21:48:37,321 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.335e+01 2.581e+01 2.814e+01 4.962e+01, threshold=5.162e+01, percent-clipped=0.0 2024-08-20 21:48:58,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4969590.0, ans=0.0 2024-08-20 21:49:07,275 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 8000, loss[loss=0.1252, beats_loss=0.008547, ecapa_loss=0.0001253, whisper_loss=0.1154, over 15222.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01038, ecapa_loss=0.000139, whisper_loss=0.08942, over 3803854.55 frames. ], batch size: 56, lr: 1.80e-03, grad_scale: 5.764607523034235e+17 2024-08-20 21:49:11,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4969690.0, ans=0.025 2024-08-20 21:49:11,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4969690.0, ans=0.125 2024-08-20 21:50:00,725 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0865812599658966, model_norm_threshold=51.61667251586914 2024-08-20 21:50:01,191 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.3.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.075e+04, grad_sumsq=5.075e+04, orig_rms_sq=1.000e+00 2024-08-20 21:50:03,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4969990.0, ans=0.0 2024-08-20 21:50:06,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4969990.0, ans=0.125 2024-08-20 21:50:11,407 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 21 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-20 21:50:28,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4970090.0, ans=0.1 2024-08-20 21:50:32,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4970090.0, ans=10.0 2024-08-20 21:50:34,965 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 8050, loss[loss=0.08695, beats_loss=0.009446, ecapa_loss=0.0001565, whisper_loss=0.07594, over 14495.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01033, ecapa_loss=0.0001394, whisper_loss=0.08976, over 3806879.27 frames. ], batch size: 58, lr: 1.80e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:50:38,786 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 22 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 21:50:40,472 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 15 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 21:51:06,559 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2024-08-20 21:51:11,202 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 32 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-20 21:51:29,359 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.14 vs. limit=22.5 2024-08-20 21:51:37,868 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.294e+01 2.558e+01 2.951e+01 5.962e+02, threshold=5.117e+01, percent-clipped=2.0 2024-08-20 21:51:48,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4970590.0, ans=0.05 2024-08-20 21:51:48,275 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.52 vs. limit=15.0 2024-08-20 21:52:03,311 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.00 vs. limit=12.0 2024-08-20 21:52:03,749 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 8100, loss[loss=0.1144, beats_loss=0.01079, ecapa_loss=0.0001671, whisper_loss=0.102, over 21793.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01037, ecapa_loss=0.0001399, whisper_loss=0.08994, over 3814752.94 frames. ], batch size: 91, lr: 1.80e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:52:04,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4970690.0, ans=0.0 2024-08-20 21:52:33,845 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 26 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-20 21:52:36,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4970790.0, ans=0.2 2024-08-20 21:52:40,838 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 22 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-20 21:53:04,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4970990.0, ans=0.125 2024-08-20 21:53:34,293 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 8150, loss[loss=0.1029, beats_loss=0.00826, ecapa_loss=0.0001335, whisper_loss=0.09326, over 14070.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01038, ecapa_loss=0.0001397, whisper_loss=0.08994, over 3817835.33 frames. ], batch size: 51, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:53:41,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4971190.0, ans=0.125 2024-08-20 21:53:42,812 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-20 21:54:02,677 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 29 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-20 21:54:27,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4971490.0, ans=0.0 2024-08-20 21:54:31,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4971490.0, ans=10.0 2024-08-20 21:54:31,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4971490.0, ans=0.125 2024-08-20 21:54:37,845 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.199e+01 2.482e+01 2.728e+01 1.075e+02, threshold=4.963e+01, percent-clipped=1.0 2024-08-20 21:54:44,555 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 30 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-20 21:54:46,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4971590.0, ans=0.0 2024-08-20 21:55:04,246 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 8200, loss[loss=0.08823, beats_loss=0.007727, ecapa_loss=0.0001613, whisper_loss=0.07889, over 14148.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01043, ecapa_loss=0.0001385, whisper_loss=0.08995, over 3830283.40 frames. ], batch size: 54, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:55:51,649 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2024-08-20 21:55:55,643 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.85 vs. limit=22.5 2024-08-20 21:56:06,026 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.98 vs. limit=22.5 2024-08-20 21:56:15,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4971990.0, ans=0.0 2024-08-20 21:56:16,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4972090.0, ans=0.125 2024-08-20 21:56:34,121 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 34 from LS+wenet, 13 from Vox, 44 fro AS 2024-08-20 21:56:34,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4972190.0, ans=0.1 2024-08-20 21:56:35,518 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 8250, loss[loss=0.1155, beats_loss=0.01125, ecapa_loss=0.000107, whisper_loss=0.1032, over 23945.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01047, ecapa_loss=0.0001385, whisper_loss=0.08986, over 3846189.97 frames. ], batch size: 91, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:56:43,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4972190.0, ans=0.1 2024-08-20 21:56:43,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4972190.0, ans=0.0 2024-08-20 21:57:02,703 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 22 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-20 21:57:07,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4972290.0, ans=0.125 2024-08-20 21:57:31,992 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 22 from LS+wenet, 34 from Vox, 33 fro AS 2024-08-20 21:57:37,239 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.205e+01 2.481e+01 2.878e+01 4.142e+01, threshold=4.962e+01, percent-clipped=0.0 2024-08-20 21:57:51,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4972590.0, ans=0.1 2024-08-20 21:57:51,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4972590.0, ans=0.2 2024-08-20 21:57:58,004 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 34 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-20 21:58:01,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4972690.0, ans=0.2 2024-08-20 21:58:02,899 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 8300, loss[loss=0.1028, beats_loss=0.00925, ecapa_loss=0.0001574, whisper_loss=0.09196, over 14103.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01038, ecapa_loss=0.0001385, whisper_loss=0.08956, over 3812522.60 frames. ], batch size: 53, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:58:07,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4972690.0, ans=0.1 2024-08-20 21:58:18,963 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 29 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-20 21:58:47,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4972890.0, ans=0.0 2024-08-20 21:59:10,405 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 28 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 21:59:24,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4973090.0, ans=0.125 2024-08-20 21:59:31,378 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 8350, loss[loss=0.1093, beats_loss=0.00882, ecapa_loss=0.0001487, whisper_loss=0.09898, over 17788.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01038, ecapa_loss=0.000139, whisper_loss=0.08937, over 3814010.31 frames. ], batch size: 72, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 21:59:37,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4973190.0, ans=0.125 2024-08-20 21:59:37,208 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.03 vs. limit=12.0 2024-08-20 21:59:40,654 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 27 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-20 21:59:44,477 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 9 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-20 22:00:07,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4973390.0, ans=0.2 2024-08-20 22:00:12,441 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 25 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 22:00:12,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4973390.0, ans=0.0 2024-08-20 22:00:33,634 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.301e+01 2.484e+01 2.727e+01 5.310e+01, threshold=4.967e+01, percent-clipped=1.0 2024-08-20 22:00:41,001 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 31 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-20 22:00:56,303 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 22:00:59,442 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 8400, loss[loss=0.08532, beats_loss=0.01269, ecapa_loss=0.0001445, whisper_loss=0.07119, over 22128.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01036, ecapa_loss=0.0001397, whisper_loss=0.08943, over 3790850.48 frames. ], batch size: 93, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:01:35,984 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 14 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-20 22:01:37,526 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 26 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-20 22:01:49,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4973890.0, ans=0.05 2024-08-20 22:02:07,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4973990.0, ans=0.125 2024-08-20 22:02:07,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4973990.0, ans=0.125 2024-08-20 22:02:18,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4974090.0, ans=0.125 2024-08-20 22:02:19,243 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 26 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-20 22:02:28,165 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 8450, loss[loss=0.09964, beats_loss=0.01262, ecapa_loss=0.0001286, whisper_loss=0.08574, over 21536.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01037, ecapa_loss=0.0001388, whisper_loss=0.08921, over 3815291.76 frames. ], batch size: 89, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:02:28,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4974190.0, ans=0.0 2024-08-20 22:03:16,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4974390.0, ans=0.125 2024-08-20 22:03:24,709 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 11 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 22:03:32,163 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.656e+01 2.304e+01 2.514e+01 2.804e+01 1.040e+02, threshold=5.029e+01, percent-clipped=2.0 2024-08-20 22:03:32,424 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 25 from LS+wenet, 12 from Vox, 54 fro AS 2024-08-20 22:03:41,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4974590.0, ans=0.125 2024-08-20 22:03:47,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4974590.0, ans=0.1 2024-08-20 22:03:54,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=4974590.0, ans=0.025 2024-08-20 22:03:54,550 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.675e-01 2024-08-20 22:03:59,161 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 8500, loss[loss=0.09319, beats_loss=0.00996, ecapa_loss=0.000145, whisper_loss=0.08178, over 17247.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01042, ecapa_loss=0.0001387, whisper_loss=0.08862, over 3802067.73 frames. ], batch size: 70, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:04:06,853 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 23 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-20 22:04:34,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4974790.0, ans=0.125 2024-08-20 22:04:39,373 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 22:04:41,147 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 25 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-20 22:05:13,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4975090.0, ans=0.125 2024-08-20 22:05:31,231 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 8550, loss[loss=0.08311, beats_loss=0.0109, ecapa_loss=0.0001347, whisper_loss=0.07087, over 15158.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01044, ecapa_loss=0.0001398, whisper_loss=0.08858, over 3822460.09 frames. ], batch size: 62, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:05:33,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4975190.0, ans=0.1 2024-08-20 22:05:33,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4975190.0, ans=0.1 2024-08-20 22:05:38,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4975190.0, ans=0.1 2024-08-20 22:06:05,941 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.97 vs. limit=12.0 2024-08-20 22:06:12,797 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.97 vs. limit=15.0 2024-08-20 22:06:19,594 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 25 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 22:06:28,655 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.09 vs. limit=10.0 2024-08-20 22:06:32,999 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.271e+01 2.473e+01 2.779e+01 6.630e+01, threshold=4.947e+01, percent-clipped=2.0 2024-08-20 22:06:49,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4975590.0, ans=0.125 2024-08-20 22:06:54,396 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 11 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-20 22:06:59,604 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 8600, loss[loss=0.1106, beats_loss=0.009518, ecapa_loss=0.0001536, whisper_loss=0.09954, over 20219.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01043, ecapa_loss=0.0001399, whisper_loss=0.08864, over 3796399.11 frames. ], batch size: 81, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:07:10,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4975690.0, ans=0.125 2024-08-20 22:07:21,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4975790.0, ans=0.0 2024-08-20 22:07:42,282 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 34 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-20 22:07:46,155 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-20 22:07:56,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4975990.0, ans=0.05 2024-08-20 22:08:03,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4975990.0, ans=0.125 2024-08-20 22:08:22,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4976090.0, ans=0.0 2024-08-20 22:08:22,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4976090.0, ans=0.125 2024-08-20 22:08:34,394 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 8650, loss[loss=0.1127, beats_loss=0.01121, ecapa_loss=0.0001232, whisper_loss=0.1003, over 16700.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01037, ecapa_loss=0.0001402, whisper_loss=0.0893, over 3788836.75 frames. ], batch size: 65, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:08:35,949 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 9 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-20 22:08:37,938 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 15 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-20 22:08:39,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4976190.0, ans=0.125 2024-08-20 22:08:42,750 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 29 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-20 22:09:08,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4976390.0, ans=0.125 2024-08-20 22:09:26,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4976490.0, ans=0.125 2024-08-20 22:09:37,400 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.289e+01 2.459e+01 2.667e+01 5.043e+01, threshold=4.917e+01, percent-clipped=1.0 2024-08-20 22:09:41,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4976490.0, ans=0.2 2024-08-20 22:10:00,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4976590.0, ans=0.125 2024-08-20 22:10:03,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4976690.0, ans=0.1 2024-08-20 22:10:04,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4976690.0, ans=0.0 2024-08-20 22:10:04,895 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 8700, loss[loss=0.07936, beats_loss=0.01008, ecapa_loss=0.0001127, whisper_loss=0.06815, over 15638.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01037, ecapa_loss=0.0001402, whisper_loss=0.08893, over 3783550.88 frames. ], batch size: 60, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:10:05,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4976690.0, ans=0.0 2024-08-20 22:10:23,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4976690.0, ans=0.2 2024-08-20 22:10:33,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4976790.0, ans=0.0 2024-08-20 22:10:35,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4976790.0, ans=0.025 2024-08-20 22:11:10,526 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 30 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-20 22:11:16,858 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 24 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-20 22:11:18,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4976990.0, ans=0.1 2024-08-20 22:11:31,758 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 22:11:40,072 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 8750, loss[loss=0.09626, beats_loss=0.01104, ecapa_loss=0.0001106, whisper_loss=0.08412, over 22916.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01045, ecapa_loss=0.0001386, whisper_loss=0.08839, over 3789457.20 frames. ], batch size: 92, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:11:56,357 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-20 22:12:07,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4977290.0, ans=0.1 2024-08-20 22:12:38,104 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 17 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-20 22:12:40,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4977490.0, ans=0.1 2024-08-20 22:12:41,553 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.311e+01 2.545e+01 2.845e+01 5.108e+01, threshold=5.089e+01, percent-clipped=1.0 2024-08-20 22:12:51,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4977590.0, ans=0.125 2024-08-20 22:13:03,328 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 16 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-20 22:13:08,475 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 8800, loss[loss=0.1065, beats_loss=0.008974, ecapa_loss=0.0001597, whisper_loss=0.09592, over 20570.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01039, ecapa_loss=0.000139, whisper_loss=0.08939, over 3804758.23 frames. ], batch size: 84, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:13:42,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4977890.0, ans=0.125 2024-08-20 22:13:47,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4977890.0, ans=0.125 2024-08-20 22:13:47,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4977890.0, ans=0.125 2024-08-20 22:13:51,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4977890.0, ans=0.1 2024-08-20 22:13:52,926 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 18 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-20 22:13:53,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4977890.0, ans=0.0 2024-08-20 22:14:02,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4977990.0, ans=0.125 2024-08-20 22:14:03,714 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 15 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-20 22:14:31,380 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 23 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-20 22:14:36,181 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 8850, loss[loss=0.1122, beats_loss=0.00802, ecapa_loss=0.0001392, whisper_loss=0.1028, over 23789.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01045, ecapa_loss=0.0001384, whisper_loss=0.0895, over 3800967.83 frames. ], batch size: 92, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:14:59,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4978290.0, ans=0.125 2024-08-20 22:15:14,279 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 22:15:19,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4978390.0, ans=0.0 2024-08-20 22:15:19,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4978390.0, ans=0.125 2024-08-20 22:15:37,880 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.277e+01 2.445e+01 2.853e+01 5.587e+01, threshold=4.890e+01, percent-clipped=1.0 2024-08-20 22:15:39,599 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 23 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-20 22:15:41,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4978490.0, ans=0.0 2024-08-20 22:15:50,608 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 34 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-20 22:15:59,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4978590.0, ans=0.125 2024-08-20 22:16:03,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4978690.0, ans=0.025 2024-08-20 22:16:04,078 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 8900, loss[loss=0.1145, beats_loss=0.01058, ecapa_loss=9.714e-05, whisper_loss=0.103, over 21355.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0104, ecapa_loss=0.0001381, whisper_loss=0.09013, over 3814978.15 frames. ], batch size: 80, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:16:42,739 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 22:17:10,179 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 23 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-20 22:17:32,253 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 8950, loss[loss=0.1025, beats_loss=0.009984, ecapa_loss=0.0001226, whisper_loss=0.09126, over 20986.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01042, ecapa_loss=0.0001377, whisper_loss=0.09013, over 3825076.79 frames. ], batch size: 82, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:17:35,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4979190.0, ans=0.125 2024-08-20 22:18:07,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4979390.0, ans=0.0 2024-08-20 22:18:32,868 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.269e+01 2.591e+01 2.866e+01 4.170e+01, threshold=5.183e+01, percent-clipped=0.0 2024-08-20 22:18:33,152 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 20 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-20 22:18:34,544 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 35 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-20 22:18:59,280 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 9000, loss[loss=0.1182, beats_loss=0.008362, ecapa_loss=0.0001189, whisper_loss=0.1087, over 17025.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01045, ecapa_loss=0.000138, whisper_loss=0.08974, over 3817735.78 frames. ], batch size: 61, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:18:59,281 INFO [train_multi_KD3.py:1140] (0/4) Computing validation loss 2024-08-20 22:19:24,010 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.4865, 2.9410, 2.9408, 1.9537, 0.2627, 3.7664, 3.4057, 1.2551], device='cuda:0') 2024-08-20 22:19:38,105 INFO [train_multi_KD3.py:1150] (0/4) Epoch 34, validation on ASR_libri: loss=0.2541, beats_loss=0, ecapa_loss=0.0005128, whisper_loss=0.249, over 931116.00 frames. 2024-08-20 22:20:04,905 INFO [train_multi_KD3.py:1150] (0/4) Epoch 34, validation on SV_voxceleb1: loss=0.003932, beats_loss=0, ecapa_loss=0.0003932, whisper_loss=0, over 944235.00 frames. 2024-08-20 22:21:44,433 INFO [train_multi_KD3.py:1150] (0/4) Epoch 34, validation on AT_audioset: loss=0.02294, beats_loss=0.02294, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 22:21:44,442 INFO [train_multi_KD3.py:1156] (0/4) Maximum memory allocated so far is 32775MB 2024-08-20 22:21:49,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4979690.0, ans=0.1 2024-08-20 22:21:51,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4979690.0, ans=0.125 2024-08-20 22:21:56,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4979690.0, ans=0.125 2024-08-20 22:21:58,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4979690.0, ans=0.025 2024-08-20 22:22:07,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=4979790.0, ans=0.5 2024-08-20 22:22:21,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4979890.0, ans=0.2 2024-08-20 22:22:21,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4979890.0, ans=0.1 2024-08-20 22:22:21,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4979890.0, ans=0.2 2024-08-20 22:22:33,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4979890.0, ans=0.0 2024-08-20 22:22:42,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4979990.0, ans=0.125 2024-08-20 22:23:11,220 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 9050, loss[loss=0.09392, beats_loss=0.01171, ecapa_loss=0.000143, whisper_loss=0.08078, over 15724.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0104, ecapa_loss=0.0001387, whisper_loss=0.08961, over 3817437.39 frames. ], batch size: 62, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:23:20,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4980190.0, ans=0.0 2024-08-20 22:23:41,074 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-20 22:23:45,547 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 22 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-20 22:23:47,520 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2024-08-20 22:23:51,088 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.023e+05 2024-08-20 22:23:55,079 WARNING [optim.py:496] (0/4) Scaling gradients by 0.043528925627470016, model_norm_threshold=51.82819747924805 2024-08-20 22:23:55,236 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.19, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.654e+05, grad_sumsq=2.654e+05, orig_rms_sq=1.000e+00 2024-08-20 22:24:04,871 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 24 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-20 22:24:06,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4980490.0, ans=0.2 2024-08-20 22:24:11,541 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.252e+01 2.512e+01 2.739e+01 1.191e+03, threshold=5.024e+01, percent-clipped=1.0 2024-08-20 22:24:20,072 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 27 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-20 22:24:36,501 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 9100, loss[loss=0.1117, beats_loss=0.009605, ecapa_loss=0.0001235, whisper_loss=0.1009, over 20886.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01036, ecapa_loss=0.000139, whisper_loss=0.08967, over 3793418.55 frames. ], batch size: 81, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:24:37,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4980690.0, ans=0.125 2024-08-20 22:24:39,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4980690.0, ans=0.05 2024-08-20 22:25:22,685 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2024-08-20 22:25:28,431 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-20 22:25:39,964 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 21 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-20 22:25:43,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4981090.0, ans=0.0 2024-08-20 22:25:59,825 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 9150, loss[loss=0.09793, beats_loss=0.01161, ecapa_loss=0.0001299, whisper_loss=0.08502, over 22559.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01038, ecapa_loss=0.0001393, whisper_loss=0.08959, over 3813467.68 frames. ], batch size: 91, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:26:04,963 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 12 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-20 22:26:06,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4981190.0, ans=0.09899494936611666 2024-08-20 22:26:20,925 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 15 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-20 22:26:25,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4981290.0, ans=0.2 2024-08-20 22:26:29,064 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.11 vs. limit=15.0 2024-08-20 22:26:36,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4981390.0, ans=0.0 2024-08-20 22:26:40,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4981390.0, ans=0.0 2024-08-20 22:26:46,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4981390.0, ans=0.1 2024-08-20 22:26:54,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4981490.0, ans=0.2 2024-08-20 22:26:58,644 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.233e+01 2.459e+01 2.675e+01 3.716e+01, threshold=4.917e+01, percent-clipped=0.0 2024-08-20 22:27:23,681 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 9200, loss[loss=0.1383, beats_loss=0.006261, ecapa_loss=0.0001363, whisper_loss=0.1306, over 16923.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01043, ecapa_loss=0.0001388, whisper_loss=0.08908, over 3826395.50 frames. ], batch size: 61, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:27:54,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4981790.0, ans=0.05 2024-08-20 22:28:09,584 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.54 vs. limit=10.0 2024-08-20 22:28:15,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4981990.0, ans=0.1 2024-08-20 22:28:21,442 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 22 from LS+wenet, 31 from Vox, 41 fro AS 2024-08-20 22:28:29,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4982090.0, ans=0.0 2024-08-20 22:28:48,013 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 9250, loss[loss=0.08723, beats_loss=0.0133, ecapa_loss=0.0001134, whisper_loss=0.0728, over 17039.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01043, ecapa_loss=0.0001392, whisper_loss=0.08947, over 3796281.78 frames. ], batch size: 68, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:29:02,279 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 37 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-20 22:29:04,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4982290.0, ans=0.125 2024-08-20 22:29:49,308 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.773e+01 2.258e+01 2.506e+01 2.833e+01 3.659e+01, threshold=5.011e+01, percent-clipped=0.0 2024-08-20 22:30:11,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4982590.0, ans=0.2 2024-08-20 22:30:15,779 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 9300, loss[loss=0.1081, beats_loss=0.01055, ecapa_loss=0.0001275, whisper_loss=0.09631, over 13240.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01039, ecapa_loss=0.0001396, whisper_loss=0.08923, over 3788022.68 frames. ], batch size: 50, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:30:41,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4982790.0, ans=0.1 2024-08-20 22:30:52,864 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 23 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-20 22:31:15,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4982990.0, ans=0.2 2024-08-20 22:31:25,861 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.73 vs. limit=6.0 2024-08-20 22:31:27,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4983090.0, ans=0.1 2024-08-20 22:31:42,223 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 9350, loss[loss=0.07806, beats_loss=0.01258, ecapa_loss=0.0001418, whisper_loss=0.06406, over 19168.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01041, ecapa_loss=0.0001392, whisper_loss=0.0893, over 3807871.38 frames. ], batch size: 83, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:31:48,225 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 20 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-20 22:32:19,741 WARNING [optim.py:496] (0/4) Scaling gradients by 0.011810386553406715, model_norm_threshold=50.11380386352539 2024-08-20 22:32:19,899 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.320e+06, grad_sumsq=2.158e+08, orig_rms_sq=1.075e-02 2024-08-20 22:32:33,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4983490.0, ans=0.1 2024-08-20 22:32:40,088 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 13 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-20 22:32:41,181 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.343e+01 2.545e+01 2.927e+01 4.243e+03, threshold=5.090e+01, percent-clipped=3.0 2024-08-20 22:32:43,717 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 35 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-20 22:32:51,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=4983590.0, ans=0.025 2024-08-20 22:33:06,848 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 9400, loss[loss=0.1253, beats_loss=0.008479, ecapa_loss=0.000161, whisper_loss=0.1152, over 23544.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01046, ecapa_loss=0.0001403, whisper_loss=0.08927, over 3836985.35 frames. ], batch size: 94, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:33:07,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4983690.0, ans=0.1 2024-08-20 22:33:14,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4983690.0, ans=0.2 2024-08-20 22:33:14,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4983690.0, ans=0.125 2024-08-20 22:33:29,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4983790.0, ans=0.125 2024-08-20 22:34:12,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4983990.0, ans=0.125 2024-08-20 22:34:19,164 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=6.405e-01 2024-08-20 22:34:24,436 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.87 vs. limit=15.0 2024-08-20 22:34:33,352 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 9450, loss[loss=0.09681, beats_loss=0.01305, ecapa_loss=0.0001254, whisper_loss=0.0825, over 22703.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01054, ecapa_loss=0.0001392, whisper_loss=0.08842, over 3860298.42 frames. ], batch size: 92, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:34:35,703 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 22:35:04,633 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 27 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-20 22:35:06,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4984390.0, ans=0.2 2024-08-20 22:35:13,425 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2024-08-20 22:35:13,816 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.84 vs. limit=15.0 2024-08-20 22:35:32,706 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.335e+01 2.553e+01 2.784e+01 4.072e+01, threshold=5.106e+01, percent-clipped=0.0 2024-08-20 22:35:43,171 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 20 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-20 22:35:45,432 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 23 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-20 22:35:58,724 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 9500, loss[loss=0.1049, beats_loss=0.008689, ecapa_loss=0.0001377, whisper_loss=0.09486, over 16973.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01048, ecapa_loss=0.0001394, whisper_loss=0.08903, over 3833255.52 frames. ], batch size: 68, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:36:08,341 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=10.82 vs. limit=12.0 2024-08-20 22:36:43,848 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.50 vs. limit=22.5 2024-08-20 22:36:43,991 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.72 vs. limit=6.0 2024-08-20 22:36:55,971 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 31 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-20 22:37:09,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4985090.0, ans=0.2 2024-08-20 22:37:26,244 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 9550, loss[loss=0.1138, beats_loss=0.008984, ecapa_loss=0.0001619, whisper_loss=0.1032, over 21986.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01051, ecapa_loss=0.000139, whisper_loss=0.08881, over 3818185.58 frames. ], batch size: 88, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:37:41,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4985290.0, ans=0.0 2024-08-20 22:37:57,211 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-20 22:38:07,155 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=15.0 2024-08-20 22:38:16,054 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=15.0 2024-08-20 22:38:18,653 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 18 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-20 22:38:20,724 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.87 vs. limit=15.0 2024-08-20 22:38:23,884 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 27 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 22:38:24,976 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.187e+01 2.362e+01 2.605e+01 3.929e+01, threshold=4.725e+01, percent-clipped=0.0 2024-08-20 22:38:36,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4985590.0, ans=0.125 2024-08-20 22:38:36,866 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.80 vs. limit=15.0 2024-08-20 22:38:51,968 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 9600, loss[loss=0.09992, beats_loss=0.009797, ecapa_loss=0.0002055, whisper_loss=0.08806, over 20862.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01055, ecapa_loss=0.0001389, whisper_loss=0.08858, over 3802107.34 frames. ], batch size: 92, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:38:57,953 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.52 vs. limit=15.0 2024-08-20 22:39:03,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4985690.0, ans=0.125 2024-08-20 22:39:08,201 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 19 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-20 22:39:23,592 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 30 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-20 22:39:36,401 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 21 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-20 22:39:43,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4985990.0, ans=0.0 2024-08-20 22:40:21,435 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 9650, loss[loss=0.09135, beats_loss=0.01251, ecapa_loss=0.0001204, whisper_loss=0.07764, over 22856.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01047, ecapa_loss=0.0001388, whisper_loss=0.08916, over 3766460.88 frames. ], batch size: 93, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:40:34,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4986190.0, ans=0.125 2024-08-20 22:40:44,179 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-20 22:41:06,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4986390.0, ans=0.0 2024-08-20 22:41:13,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4986490.0, ans=0.0 2024-08-20 22:41:19,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4986490.0, ans=0.0 2024-08-20 22:41:22,167 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.643e+01 2.288e+01 2.415e+01 2.707e+01 3.907e+01, threshold=4.829e+01, percent-clipped=0.0 2024-08-20 22:41:34,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4986590.0, ans=0.125 2024-08-20 22:41:45,575 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 22 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-20 22:41:48,627 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 9700, loss[loss=0.1281, beats_loss=0.006359, ecapa_loss=0.0001735, whisper_loss=0.12, over 18953.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01043, ecapa_loss=0.0001394, whisper_loss=0.08875, over 3767853.19 frames. ], batch size: 75, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:41:49,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4986690.0, ans=0.125 2024-08-20 22:41:55,371 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 29 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-20 22:41:57,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4986690.0, ans=0.0 2024-08-20 22:42:20,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4986790.0, ans=0.1 2024-08-20 22:42:23,351 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 13 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 22:42:26,556 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 32 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 22:42:30,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4986890.0, ans=0.0 2024-08-20 22:42:33,366 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 34 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 22:43:15,061 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 9750, loss[loss=0.1132, beats_loss=0.009209, ecapa_loss=0.0001476, whisper_loss=0.1025, over 20535.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01035, ecapa_loss=0.0001392, whisper_loss=0.08904, over 3767756.11 frames. ], batch size: 83, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:43:24,487 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.68 vs. limit=22.5 2024-08-20 22:43:32,945 INFO [train_multi_KD3.py:845] (0/4) A total of 49 cuts. 15 from LS+wenet, 9 from Vox, 25 fro AS 2024-08-20 22:43:48,980 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.23 vs. limit=10.0 2024-08-20 22:44:15,604 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.763e+01 2.197e+01 2.422e+01 2.722e+01 3.881e+01, threshold=4.844e+01, percent-clipped=0.0 2024-08-20 22:44:18,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4987490.0, ans=0.125 2024-08-20 22:44:33,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4987590.0, ans=0.1 2024-08-20 22:44:38,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4987590.0, ans=0.125 2024-08-20 22:44:38,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4987590.0, ans=0.125 2024-08-20 22:44:41,303 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 9800, loss[loss=0.105, beats_loss=0.01057, ecapa_loss=0.0001544, whisper_loss=0.0929, over 17856.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01044, ecapa_loss=0.0001387, whisper_loss=0.08915, over 3814928.46 frames. ], batch size: 72, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:44:49,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4987690.0, ans=0.1 2024-08-20 22:45:08,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4987790.0, ans=0.125 2024-08-20 22:45:39,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4987990.0, ans=0.125 2024-08-20 22:45:54,072 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.12 vs. limit=15.0 2024-08-20 22:46:06,809 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 9850, loss[loss=0.08824, beats_loss=0.01052, ecapa_loss=0.0001819, whisper_loss=0.0759, over 14764.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01045, ecapa_loss=0.0001384, whisper_loss=0.08854, over 3781389.58 frames. ], batch size: 63, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:46:07,165 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 13 from LS+wenet, 24 from Vox, 17 fro AS 2024-08-20 22:46:12,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4988190.0, ans=0.0 2024-08-20 22:46:15,979 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 22 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-20 22:46:26,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4988290.0, ans=0.0 2024-08-20 22:46:33,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4988290.0, ans=0.0 2024-08-20 22:46:42,488 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 28 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-20 22:46:49,511 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 19 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-20 22:46:54,183 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 22 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-20 22:46:54,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4988390.0, ans=0.125 2024-08-20 22:46:56,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4988390.0, ans=0.125 2024-08-20 22:47:07,544 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.349e+01 2.590e+01 2.982e+01 4.203e+01, threshold=5.180e+01, percent-clipped=0.0 2024-08-20 22:47:12,982 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 16 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-20 22:47:19,270 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2024-08-20 22:47:32,066 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 29 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-20 22:47:34,501 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 9900, loss[loss=0.121, beats_loss=0.008511, ecapa_loss=0.0001496, whisper_loss=0.111, over 19433.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01042, ecapa_loss=0.0001391, whisper_loss=0.0889, over 3758133.86 frames. ], batch size: 77, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:47:34,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4988690.0, ans=0.0 2024-08-20 22:47:36,005 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 25 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-20 22:47:43,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4988690.0, ans=0.2 2024-08-20 22:47:43,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4988690.0, ans=0.2 2024-08-20 22:48:03,381 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2024-08-20 22:48:04,316 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 12 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-20 22:48:54,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4989090.0, ans=0.125 2024-08-20 22:48:58,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4989090.0, ans=0.1 2024-08-20 22:49:01,534 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 9950, loss[loss=0.07605, beats_loss=0.01056, ecapa_loss=0.000152, whisper_loss=0.06397, over 16585.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01047, ecapa_loss=0.0001385, whisper_loss=0.08889, over 3751684.73 frames. ], batch size: 68, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:49:19,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4989290.0, ans=0.0 2024-08-20 22:49:23,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4989290.0, ans=0.0 2024-08-20 22:49:35,809 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 23 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-20 22:50:02,139 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.629e+01 2.261e+01 2.497e+01 2.811e+01 6.221e+01, threshold=4.994e+01, percent-clipped=1.0 2024-08-20 22:50:11,413 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 21 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-20 22:50:11,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4989590.0, ans=0.125 2024-08-20 22:50:16,168 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 22:50:28,708 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 10000, loss[loss=0.09937, beats_loss=0.01288, ecapa_loss=0.0001263, whisper_loss=0.08523, over 23757.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01067, ecapa_loss=0.0001372, whisper_loss=0.08821, over 3760905.61 frames. ], batch size: 96, lr: 1.79e-03, grad_scale: 1.4411518807585587e+17 2024-08-20 22:50:33,056 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 33 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-20 22:50:39,136 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.43 vs. limit=15.0 2024-08-20 22:50:40,152 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 16 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-20 22:50:47,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4989790.0, ans=0.125 2024-08-20 22:50:49,468 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 20 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-20 22:51:05,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4989890.0, ans=0.0 2024-08-20 22:51:43,092 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 23 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-20 22:51:52,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4990090.0, ans=0.2 2024-08-20 22:51:57,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4990190.0, ans=0.125 2024-08-20 22:51:58,422 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 10050, loss[loss=0.1113, beats_loss=0.01095, ecapa_loss=0.0001348, whisper_loss=0.09902, over 19299.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01069, ecapa_loss=0.0001368, whisper_loss=0.08867, over 3775145.17 frames. ], batch size: 76, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 22:51:58,664 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 20 from LS+wenet, 31 from Vox, 44 fro AS 2024-08-20 22:52:02,540 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.20 vs. limit=15.0 2024-08-20 22:52:12,448 WARNING [optim.py:496] (0/4) Scaling gradients by 0.07415100187063217, model_norm_threshold=49.94480514526367 2024-08-20 22:52:12,603 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.37, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.662e+05, grad_sumsq=1.662e+05, orig_rms_sq=1.000e+00 2024-08-20 22:52:13,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4990190.0, ans=0.125 2024-08-20 22:52:23,104 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 13 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-20 22:52:33,777 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 15 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-20 22:52:41,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4990390.0, ans=0.125 2024-08-20 22:52:46,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4990390.0, ans=0.125 2024-08-20 22:53:00,358 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.315e+01 2.541e+01 2.876e+01 6.736e+02, threshold=5.082e+01, percent-clipped=1.0 2024-08-20 22:53:18,049 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 16 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-20 22:53:28,271 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 10100, loss[loss=0.08133, beats_loss=0.01187, ecapa_loss=0.0001249, whisper_loss=0.06822, over 21128.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01071, ecapa_loss=0.0001372, whisper_loss=0.08826, over 3811067.34 frames. ], batch size: 87, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 22:53:34,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4990690.0, ans=0.125 2024-08-20 22:53:34,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4990690.0, ans=0.125 2024-08-20 22:53:55,710 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2024-08-20 22:54:04,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4990890.0, ans=0.0 2024-08-20 22:54:05,999 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 25 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-20 22:54:35,503 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 24 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-20 22:54:43,626 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 12 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-20 22:54:54,994 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 10150, loss[loss=0.1028, beats_loss=0.01137, ecapa_loss=0.000106, whisper_loss=0.09039, over 23646.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.0108, ecapa_loss=0.0001364, whisper_loss=0.08838, over 3811247.53 frames. ], batch size: 93, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 22:54:56,368 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.50 vs. limit=22.5 2024-08-20 22:55:33,013 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 22 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-20 22:55:48,557 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 33 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-20 22:55:56,414 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.245e+01 2.529e+01 2.822e+01 1.463e+02, threshold=5.058e+01, percent-clipped=1.0 2024-08-20 22:55:59,047 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 27 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 22:56:18,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4991590.0, ans=0.0 2024-08-20 22:56:22,408 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 10200, loss[loss=0.1093, beats_loss=0.009997, ecapa_loss=0.00015, whisper_loss=0.09779, over 19429.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01071, ecapa_loss=0.0001371, whisper_loss=0.08905, over 3834888.13 frames. ], batch size: 79, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 22:56:30,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4991690.0, ans=0.125 2024-08-20 22:56:38,922 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 33 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-20 22:56:40,625 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 14 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-20 22:56:42,845 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.09 vs. limit=15.0 2024-08-20 22:56:52,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4991790.0, ans=0.0 2024-08-20 22:56:58,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4991890.0, ans=0.125 2024-08-20 22:57:05,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4991890.0, ans=0.0 2024-08-20 22:57:27,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4991990.0, ans=0.1 2024-08-20 22:57:27,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4991990.0, ans=0.125 2024-08-20 22:57:38,590 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=15.0 2024-08-20 22:57:54,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4992190.0, ans=0.2 2024-08-20 22:57:54,884 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 10250, loss[loss=0.1008, beats_loss=0.007876, ecapa_loss=0.0001149, whisper_loss=0.09182, over 14452.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0107, ecapa_loss=0.0001368, whisper_loss=0.08913, over 3841332.57 frames. ], batch size: 53, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 22:58:08,722 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 24 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-20 22:58:13,225 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.35 vs. limit=10.0 2024-08-20 22:58:54,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4992490.0, ans=0.2 2024-08-20 22:58:54,547 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-08-20 22:58:59,596 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.301e+01 2.546e+01 2.793e+01 3.961e+01, threshold=5.092e+01, percent-clipped=0.0 2024-08-20 22:59:07,603 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.07 vs. limit=15.0 2024-08-20 22:59:26,741 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 10300, loss[loss=0.1169, beats_loss=0.00901, ecapa_loss=0.0002089, whisper_loss=0.1058, over 20998.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01062, ecapa_loss=0.000138, whisper_loss=0.08946, over 3834462.27 frames. ], batch size: 88, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 22:59:55,646 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 13 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-20 23:00:08,646 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-08-20 23:00:12,666 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 28 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-20 23:00:14,551 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 29 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-20 23:00:23,898 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 23 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-20 23:00:25,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4992990.0, ans=0.125 2024-08-20 23:00:26,143 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.00 vs. limit=10.0 2024-08-20 23:00:31,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4992990.0, ans=10.0 2024-08-20 23:00:37,847 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 12 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-20 23:00:40,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4993090.0, ans=0.0 2024-08-20 23:00:42,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4993090.0, ans=0.125 2024-08-20 23:00:49,204 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 19 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-20 23:00:58,094 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 10350, loss[loss=0.127, beats_loss=0.009267, ecapa_loss=0.0001363, whisper_loss=0.1163, over 23515.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01056, ecapa_loss=0.0001384, whisper_loss=0.08945, over 3850350.95 frames. ], batch size: 92, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:01:19,995 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 16 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-20 23:01:22,017 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-20 23:01:25,299 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 28 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-20 23:01:40,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4993390.0, ans=0.0 2024-08-20 23:01:49,999 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 20 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-20 23:02:01,077 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.402e+01 2.644e+01 3.028e+01 6.335e+01, threshold=5.289e+01, percent-clipped=1.0 2024-08-20 23:02:26,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4993590.0, ans=0.0 2024-08-20 23:02:29,162 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 10400, loss[loss=0.1167, beats_loss=0.008518, ecapa_loss=0.0001748, whisper_loss=0.1064, over 15473.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01049, ecapa_loss=0.0001392, whisper_loss=0.08986, over 3841010.50 frames. ], batch size: 63, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:02:29,893 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 25 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 23:02:30,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4993690.0, ans=10.0 2024-08-20 23:02:33,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4993690.0, ans=0.1 2024-08-20 23:02:45,047 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 17 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-20 23:03:03,681 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 28 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-20 23:03:13,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4993890.0, ans=0.2 2024-08-20 23:03:24,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4993990.0, ans=0.0 2024-08-20 23:03:59,830 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 10450, loss[loss=0.1048, beats_loss=0.01026, ecapa_loss=0.0001297, whisper_loss=0.09325, over 19477.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01039, ecapa_loss=0.0001396, whisper_loss=0.08986, over 3822366.30 frames. ], batch size: 75, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:04:28,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4994290.0, ans=0.2 2024-08-20 23:04:28,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4994290.0, ans=0.125 2024-08-20 23:04:59,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=4994490.0, ans=10.0 2024-08-20 23:05:04,036 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.282e+01 2.461e+01 2.662e+01 8.138e+01, threshold=4.922e+01, percent-clipped=1.0 2024-08-20 23:05:28,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4994590.0, ans=0.0 2024-08-20 23:05:31,032 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 10500, loss[loss=0.112, beats_loss=0.01086, ecapa_loss=0.0001045, whisper_loss=0.1001, over 23619.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01042, ecapa_loss=0.0001393, whisper_loss=0.08961, over 3843745.34 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:05:31,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4994690.0, ans=0.125 2024-08-20 23:05:37,056 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 29 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-20 23:05:40,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4994690.0, ans=0.125 2024-08-20 23:05:40,661 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.44 vs. limit=22.5 2024-08-20 23:06:19,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4994890.0, ans=0.125 2024-08-20 23:06:23,395 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.28 vs. limit=15.0 2024-08-20 23:06:33,898 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2024-08-20 23:06:35,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4994990.0, ans=0.125 2024-08-20 23:06:41,904 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-20 23:06:44,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4995090.0, ans=0.1 2024-08-20 23:06:58,687 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 10550, loss[loss=0.0884, beats_loss=0.01066, ecapa_loss=0.0001517, whisper_loss=0.07623, over 13029.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01033, ecapa_loss=0.0001405, whisper_loss=0.09066, over 3859985.70 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:07:09,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4995190.0, ans=0.0 2024-08-20 23:07:23,560 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.88 vs. limit=15.0 2024-08-20 23:07:49,171 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-20 23:07:49,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4995490.0, ans=0.1 2024-08-20 23:07:59,151 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.287e+01 2.503e+01 2.760e+01 4.390e+01, threshold=5.007e+01, percent-clipped=0.0 2024-08-20 23:07:59,355 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 29 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-20 23:08:02,780 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 20 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-20 23:08:06,536 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 23:08:11,877 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-20 23:08:21,002 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 23:08:25,598 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 10600, loss[loss=0.1077, beats_loss=0.01152, ecapa_loss=0.0001176, whisper_loss=0.09496, over 23046.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0103, ecapa_loss=0.000141, whisper_loss=0.09063, over 3860476.88 frames. ], batch size: 92, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:08:34,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4995690.0, ans=0.125 2024-08-20 23:08:53,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4995790.0, ans=0.0 2024-08-20 23:09:05,563 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 16 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 23:09:31,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4995990.0, ans=0.0 2024-08-20 23:09:33,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4995990.0, ans=0.0 2024-08-20 23:09:34,142 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.43 vs. limit=15.0 2024-08-20 23:09:46,468 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 27 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-20 23:09:59,578 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 10650, loss[loss=0.1103, beats_loss=0.01159, ecapa_loss=0.0001207, whisper_loss=0.09753, over 22988.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01036, ecapa_loss=0.0001393, whisper_loss=0.09023, over 3848907.43 frames. ], batch size: 92, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:10:18,423 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 16 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-20 23:10:45,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4996390.0, ans=0.2 2024-08-20 23:10:56,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4996490.0, ans=0.2 2024-08-20 23:11:03,910 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.94 vs. limit=6.0 2024-08-20 23:11:06,413 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.258e+01 2.484e+01 2.690e+01 6.351e+01, threshold=4.969e+01, percent-clipped=1.0 2024-08-20 23:11:07,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4996490.0, ans=0.0 2024-08-20 23:11:33,150 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 10700, loss[loss=0.1023, beats_loss=0.009262, ecapa_loss=0.0001842, whisper_loss=0.09122, over 13072.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01045, ecapa_loss=0.0001393, whisper_loss=0.09003, over 3855787.39 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:11:39,500 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 23:11:52,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4996790.0, ans=0.2 2024-08-20 23:12:17,527 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 23 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 23:12:19,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4996890.0, ans=0.2 2024-08-20 23:12:28,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4996990.0, ans=0.0 2024-08-20 23:12:34,267 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-20 23:12:37,442 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-20 23:12:37,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4996990.0, ans=0.125 2024-08-20 23:12:37,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4996990.0, ans=0.0 2024-08-20 23:12:38,152 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.94 vs. limit=15.0 2024-08-20 23:12:43,065 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 20 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-20 23:12:48,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4997090.0, ans=0.125 2024-08-20 23:12:50,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4997090.0, ans=0.125 2024-08-20 23:13:05,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=4997190.0, ans=15.0 2024-08-20 23:13:06,186 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 10750, loss[loss=0.07642, beats_loss=0.0109, ecapa_loss=0.0001387, whisper_loss=0.06413, over 20081.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01052, ecapa_loss=0.0001388, whisper_loss=0.08917, over 3844235.75 frames. ], batch size: 81, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:13:07,345 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-20 23:13:17,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4997190.0, ans=0.125 2024-08-20 23:13:29,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4997290.0, ans=0.0 2024-08-20 23:13:29,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.35 vs. limit=22.5 2024-08-20 23:13:36,969 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 25 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-20 23:13:46,895 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 27 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-20 23:13:56,357 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 32 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-20 23:14:11,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4997490.0, ans=0.0 2024-08-20 23:14:18,683 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.299e+01 2.568e+01 2.833e+01 1.794e+02, threshold=5.137e+01, percent-clipped=1.0 2024-08-20 23:14:38,355 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 19 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-20 23:14:46,992 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 10800, loss[loss=0.1095, beats_loss=0.0101, ecapa_loss=0.0001462, whisper_loss=0.09793, over 22287.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01049, ecapa_loss=0.0001391, whisper_loss=0.08933, over 3844564.87 frames. ], batch size: 89, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:14:48,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4997690.0, ans=0.125 2024-08-20 23:15:02,269 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 18 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-20 23:15:08,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4997790.0, ans=0.1 2024-08-20 23:15:14,990 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 27 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-20 23:15:17,814 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2024-08-20 23:15:35,191 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.22 vs. limit=15.0 2024-08-20 23:16:04,831 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=22.5 2024-08-20 23:16:17,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4998090.0, ans=0.125 2024-08-20 23:16:20,524 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 10850, loss[loss=0.09221, beats_loss=0.01261, ecapa_loss=0.0001147, whisper_loss=0.07845, over 20227.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01053, ecapa_loss=0.0001382, whisper_loss=0.08842, over 3817832.05 frames. ], batch size: 79, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:16:20,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4998190.0, ans=0.0 2024-08-20 23:16:21,378 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.19 vs. limit=15.0 2024-08-20 23:16:22,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4998190.0, ans=0.2 2024-08-20 23:16:58,504 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 30 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 23:17:02,280 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 34 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 23:17:10,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=4998390.0, ans=10.0 2024-08-20 23:17:24,045 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.245e+01 2.613e+01 2.967e+01 9.176e+01, threshold=5.227e+01, percent-clipped=1.0 2024-08-20 23:17:45,809 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 23 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-20 23:17:51,198 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 10900, loss[loss=0.1004, beats_loss=0.01057, ecapa_loss=0.0001509, whisper_loss=0.08828, over 19057.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01053, ecapa_loss=0.0001373, whisper_loss=0.08891, over 3846463.77 frames. ], batch size: 77, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:18:15,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4998790.0, ans=0.125 2024-08-20 23:18:21,687 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 24 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-20 23:18:21,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4998790.0, ans=0.125 2024-08-20 23:18:25,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4998890.0, ans=0.0 2024-08-20 23:18:44,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4998990.0, ans=0.0 2024-08-20 23:18:47,771 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.72 vs. limit=22.5 2024-08-20 23:18:59,597 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-20 23:19:06,921 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 33 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-20 23:19:07,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4999090.0, ans=0.0 2024-08-20 23:19:08,396 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 15 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-20 23:19:14,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4999090.0, ans=0.1 2024-08-20 23:19:16,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4999090.0, ans=0.1 2024-08-20 23:19:16,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4999090.0, ans=0.0 2024-08-20 23:19:21,335 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 10950, loss[loss=0.09393, beats_loss=0.01234, ecapa_loss=0.0001283, whisper_loss=0.0803, over 22401.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01054, ecapa_loss=0.0001372, whisper_loss=0.08931, over 3845185.03 frames. ], batch size: 91, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:19:23,234 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 28 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-20 23:19:30,950 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=15.0 2024-08-20 23:19:32,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4999190.0, ans=0.125 2024-08-20 23:19:37,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4999290.0, ans=0.0 2024-08-20 23:19:45,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4999290.0, ans=0.0 2024-08-20 23:19:45,641 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=6.129e-02 2024-08-20 23:20:10,493 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 17 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-20 23:20:16,162 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 24 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-20 23:20:20,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4999490.0, ans=0.1 2024-08-20 23:20:25,224 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.300e+01 2.546e+01 2.869e+01 3.848e+01, threshold=5.092e+01, percent-clipped=0.0 2024-08-20 23:20:27,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4999490.0, ans=0.5 2024-08-20 23:20:41,957 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 25 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-20 23:20:44,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4999590.0, ans=0.0 2024-08-20 23:20:46,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4999590.0, ans=0.125 2024-08-20 23:20:52,494 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 11000, loss[loss=0.1119, beats_loss=0.01028, ecapa_loss=0.0001602, whisper_loss=0.1, over 22325.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01047, ecapa_loss=0.0001376, whisper_loss=0.09014, over 3829438.16 frames. ], batch size: 92, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:20:52,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4999690.0, ans=0.1 2024-08-20 23:20:56,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4999690.0, ans=0.125 2024-08-20 23:21:09,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=4999790.0, ans=0.02 2024-08-20 23:21:14,444 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 19 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-20 23:21:44,563 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 27 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-20 23:21:48,561 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-500000.pt 2024-08-20 23:21:52,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4999990.0, ans=0.125 2024-08-20 23:21:58,115 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 28 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-20 23:22:02,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4999990.0, ans=0.1 2024-08-20 23:22:27,404 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 11050, loss[loss=0.1169, beats_loss=0.008203, ecapa_loss=0.0001449, whisper_loss=0.1072, over 14215.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01039, ecapa_loss=0.0001378, whisper_loss=0.09109, over 3843357.00 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:22:31,047 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-20 23:22:51,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5000290.0, ans=0.1 2024-08-20 23:22:54,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=5000290.0, ans=0.2 2024-08-20 23:22:57,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5000290.0, ans=0.0 2024-08-20 23:23:12,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5000390.0, ans=0.1 2024-08-20 23:23:22,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=5000490.0, ans=0.0 2024-08-20 23:23:32,971 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.245e+01 2.536e+01 2.821e+01 3.723e+01, threshold=5.072e+01, percent-clipped=0.0 2024-08-20 23:24:01,827 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 11100, loss[loss=0.1011, beats_loss=0.009592, ecapa_loss=0.0001432, whisper_loss=0.09003, over 21558.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0104, ecapa_loss=0.000137, whisper_loss=0.09129, over 3861819.94 frames. ], batch size: 87, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:24:08,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5000690.0, ans=0.1 2024-08-20 23:24:11,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5000690.0, ans=0.0 2024-08-20 23:24:14,685 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 18 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-20 23:24:50,287 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 25 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-20 23:24:55,557 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-20 23:25:02,019 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.05 vs. limit=15.0 2024-08-20 23:25:19,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5000990.0, ans=0.125 2024-08-20 23:25:22,697 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 15 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-20 23:25:33,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=5001090.0, ans=0.0 2024-08-20 23:25:39,588 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.90 vs. limit=10.0 2024-08-20 23:25:40,219 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 11150, loss[loss=0.1177, beats_loss=0.01089, ecapa_loss=0.000151, whisper_loss=0.1053, over 22198.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01043, ecapa_loss=0.000137, whisper_loss=0.09069, over 3887559.99 frames. ], batch size: 88, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:25:44,165 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-20 23:26:18,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=5001390.0, ans=0.09899494936611666 2024-08-20 23:26:19,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5001390.0, ans=0.125 2024-08-20 23:26:34,220 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-20 23:26:44,393 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.389e+01 2.664e+01 3.018e+01 8.039e+01, threshold=5.328e+01, percent-clipped=1.0 2024-08-20 23:27:00,628 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.77 vs. limit=15.0 2024-08-20 23:27:09,587 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 18 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-20 23:27:12,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5001590.0, ans=0.0 2024-08-20 23:27:14,918 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 11200, loss[loss=0.1128, beats_loss=0.01011, ecapa_loss=0.0001206, whisper_loss=0.1015, over 22276.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01041, ecapa_loss=0.0001374, whisper_loss=0.09044, over 3899050.89 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:27:24,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=5001690.0, ans=0.5 2024-08-20 23:27:43,359 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 32 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-20 23:27:58,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5001890.0, ans=0.125 2024-08-20 23:28:09,303 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 25 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-20 23:28:11,179 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 24 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-20 23:28:12,672 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 23:28:20,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=5001990.0, ans=0.5 2024-08-20 23:28:47,406 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 11250, loss[loss=0.1187, beats_loss=0.00984, ecapa_loss=0.0001222, whisper_loss=0.1077, over 22299.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.0001381, whisper_loss=0.09039, over 3871319.16 frames. ], batch size: 88, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:28:57,014 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 24 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-20 23:29:20,508 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 25 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-20 23:29:33,371 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.25 vs. limit=15.0 2024-08-20 23:29:45,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5002490.0, ans=0.1 2024-08-20 23:29:54,116 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.254e+01 2.512e+01 2.929e+01 3.894e+01, threshold=5.023e+01, percent-clipped=0.0 2024-08-20 23:30:02,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5002590.0, ans=0.125 2024-08-20 23:30:07,665 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 23:30:22,465 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 11300, loss[loss=0.08506, beats_loss=0.01054, ecapa_loss=0.0001638, whisper_loss=0.07289, over 19448.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0104, ecapa_loss=0.0001385, whisper_loss=0.09, over 3854251.26 frames. ], batch size: 81, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:30:25,042 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-08-20 23:30:54,619 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-20 23:30:56,964 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.47 vs. limit=15.0 2024-08-20 23:31:04,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=5002890.0, ans=0.125 2024-08-20 23:31:11,688 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.29 vs. limit=15.0 2024-08-20 23:31:26,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=5002990.0, ans=0.07 2024-08-20 23:31:44,334 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.35 vs. limit=15.0 2024-08-20 23:32:09,474 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 11350, loss[loss=0.1002, beats_loss=0.009468, ecapa_loss=0.0001451, whisper_loss=0.08932, over 20370.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01042, ecapa_loss=0.0001381, whisper_loss=0.08995, over 3853368.54 frames. ], batch size: 81, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:32:09,869 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 28 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-20 23:32:51,946 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 37 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 23:33:03,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5003390.0, ans=0.1 2024-08-20 23:33:08,423 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 34 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-20 23:33:16,692 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.703e+01 2.267e+01 2.559e+01 2.926e+01 1.468e+02, threshold=5.117e+01, percent-clipped=1.0 2024-08-20 23:33:33,704 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 21 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-20 23:33:44,018 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 11400, loss[loss=0.0956, beats_loss=0.01048, ecapa_loss=0.0001419, whisper_loss=0.0837, over 19651.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01044, ecapa_loss=0.0001387, whisper_loss=0.08947, over 3830407.65 frames. ], batch size: 78, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:33:51,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=5003690.0, ans=0.0 2024-08-20 23:33:52,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5003690.0, ans=0.125 2024-08-20 23:33:56,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5003690.0, ans=0.125 2024-08-20 23:34:04,496 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.89 vs. limit=10.0 2024-08-20 23:34:15,181 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-20 23:34:28,395 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.45 vs. limit=15.0 2024-08-20 23:34:32,448 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 13 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-20 23:35:07,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=5004090.0, ans=0.2 2024-08-20 23:35:07,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5004090.0, ans=0.0 2024-08-20 23:35:16,169 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 11450, loss[loss=0.09842, beats_loss=0.01221, ecapa_loss=0.0001438, whisper_loss=0.08477, over 21879.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01046, ecapa_loss=0.0001377, whisper_loss=0.09038, over 3835571.12 frames. ], batch size: 93, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:35:19,777 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-20 23:35:20,407 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.26 vs. limit=10.0 2024-08-20 23:35:26,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=5004190.0, ans=0.07 2024-08-20 23:35:40,856 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 30 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-20 23:35:41,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=5004290.0, ans=0.0 2024-08-20 23:36:31,464 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.263e+01 2.473e+01 2.920e+01 3.744e+01, threshold=4.946e+01, percent-clipped=0.0 2024-08-20 23:36:33,809 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 22 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-20 23:36:59,833 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 11500, loss[loss=0.1072, beats_loss=0.01083, ecapa_loss=0.0001391, whisper_loss=0.09501, over 21615.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01038, ecapa_loss=0.0001386, whisper_loss=0.08995, over 3838006.38 frames. ], batch size: 88, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:37:27,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5004790.0, ans=0.0 2024-08-20 23:37:31,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=5004790.0, ans=0.0 2024-08-20 23:37:41,745 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 25 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-20 23:38:02,362 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 17 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-20 23:38:02,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5004990.0, ans=0.125 2024-08-20 23:38:04,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5004990.0, ans=0.125 2024-08-20 23:38:06,867 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-20 23:38:13,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=5005090.0, ans=0.125 2024-08-20 23:38:24,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5005090.0, ans=0.125 2024-08-20 23:38:27,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=5005090.0, ans=0.025 2024-08-20 23:38:38,279 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 11550, loss[loss=0.1226, beats_loss=0.009815, ecapa_loss=0.0001245, whisper_loss=0.1115, over 21673.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01032, ecapa_loss=0.0001388, whisper_loss=0.09023, over 3829826.12 frames. ], batch size: 83, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:38:46,316 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=15.0 2024-08-20 23:39:15,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=5005390.0, ans=0.125 2024-08-20 23:39:45,672 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.283e+01 2.508e+01 2.789e+01 4.307e+01, threshold=5.015e+01, percent-clipped=0.0 2024-08-20 23:39:46,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=5005490.0, ans=0.07 2024-08-20 23:39:51,353 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 25 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-20 23:40:04,748 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 19 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-20 23:40:08,384 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 25 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-20 23:40:15,999 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=15.0 2024-08-20 23:40:16,510 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 11600, loss[loss=0.1125, beats_loss=0.01178, ecapa_loss=0.0001237, whisper_loss=0.09945, over 22657.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0103, ecapa_loss=0.000137, whisper_loss=0.09049, over 3824032.32 frames. ], batch size: 89, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:40:23,110 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.55 vs. limit=22.5 2024-08-20 23:40:32,231 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.00 vs. limit=22.5 2024-08-20 23:40:40,735 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 14 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-20 23:40:57,459 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 23 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-20 23:41:02,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=5005890.0, ans=0.125 2024-08-20 23:41:28,423 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 20 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-20 23:41:50,153 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 11650, loss[loss=0.1088, beats_loss=0.009464, ecapa_loss=0.0001265, whisper_loss=0.0981, over 20367.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01024, ecapa_loss=0.000138, whisper_loss=0.09064, over 3830331.68 frames. ], batch size: 79, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:41:50,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=5006190.0, ans=0.0 2024-08-20 23:42:20,625 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 32 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-20 23:42:31,105 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 23 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-20 23:42:53,584 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.012e+01 2.472e+01 2.706e+01 2.968e+01 4.797e+01, threshold=5.413e+01, percent-clipped=0.0 2024-08-20 23:43:09,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5006590.0, ans=0.125 2024-08-20 23:43:13,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=5006590.0, ans=10.0 2024-08-20 23:43:14,367 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 16 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-20 23:43:21,202 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 11700, loss[loss=0.09977, beats_loss=0.0109, ecapa_loss=0.0001718, whisper_loss=0.08716, over 22008.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01023, ecapa_loss=0.0001382, whisper_loss=0.09069, over 3831154.43 frames. ], batch size: 94, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:43:29,208 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.77 vs. limit=6.0 2024-08-20 23:43:55,999 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 27 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-20 23:43:57,901 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 23 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-20 23:44:29,887 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 30 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-20 23:44:35,253 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 27 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-20 23:44:35,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5007090.0, ans=0.0 2024-08-20 23:44:41,229 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-20 23:44:55,510 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 27 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-20 23:44:57,127 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 11750, loss[loss=0.1053, beats_loss=0.008945, ecapa_loss=0.000122, whisper_loss=0.09515, over 20328.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01022, ecapa_loss=0.0001375, whisper_loss=0.0903, over 3815243.65 frames. ], batch size: 77, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:45:18,092 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 16 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-20 23:45:20,080 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 29 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-20 23:45:32,341 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 24 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-20 23:45:32,924 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.39 vs. limit=22.5 2024-08-20 23:45:35,602 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08465281873941422, model_norm_threshold=54.12553405761719 2024-08-20 23:45:35,761 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.1.norm.log_scale with proportion 0.07, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.880e+04, grad_sumsq=2.880e+04, orig_rms_sq=1.000e+00 2024-08-20 23:45:41,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=5007390.0, ans=0.125 2024-08-20 23:45:47,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5007390.0, ans=0.0 2024-08-20 23:45:47,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=5007390.0, ans=0.0 2024-08-20 23:45:50,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5007490.0, ans=0.1 2024-08-20 23:46:01,516 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.667e+01 2.339e+01 2.549e+01 2.970e+01 6.394e+02, threshold=5.099e+01, percent-clipped=1.0 2024-08-20 23:46:02,223 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 25 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-20 23:46:02,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=5007490.0, ans=0.09899494936611666 2024-08-20 23:46:02,803 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.63 vs. limit=15.0 2024-08-20 23:46:13,710 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 23 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-20 23:46:17,674 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-20 23:46:19,779 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=15.0 2024-08-20 23:46:28,047 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 18 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-20 23:46:32,068 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 11800, loss[loss=0.09567, beats_loss=0.01239, ecapa_loss=0.0001344, whisper_loss=0.08194, over 22583.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01023, ecapa_loss=0.0001386, whisper_loss=0.09014, over 3799067.78 frames. ], batch size: 91, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:47:34,193 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=15.0 2024-08-20 23:48:11,480 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 11850, loss[loss=0.08047, beats_loss=0.0112, ecapa_loss=0.0001285, whisper_loss=0.06799, over 19725.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01023, ecapa_loss=0.000139, whisper_loss=0.0895, over 3765539.78 frames. ], batch size: 81, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:48:23,344 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 28 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-20 23:48:39,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=5008290.0, ans=0.95 2024-08-20 23:48:44,391 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 26 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-20 23:48:54,253 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=15.0 2024-08-20 23:49:08,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5008390.0, ans=0.125 2024-08-20 23:49:14,602 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=12.0 2024-08-20 23:49:20,962 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.687e+01 2.312e+01 2.598e+01 2.861e+01 4.202e+01, threshold=5.196e+01, percent-clipped=0.0 2024-08-20 23:49:29,218 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 24 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 23:49:41,455 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 17 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-20 23:49:48,533 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 22 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-20 23:49:51,653 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 11900, loss[loss=0.1265, beats_loss=0.01006, ecapa_loss=0.000123, whisper_loss=0.1152, over 23453.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01031, ecapa_loss=0.0001379, whisper_loss=0.08935, over 3814050.38 frames. ], batch size: 89, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:50:05,826 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 32 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-20 23:50:17,203 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 31 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-20 23:50:39,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5008890.0, ans=0.125 2024-08-20 23:50:46,370 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 31 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 23:51:00,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=5008990.0, ans=0.125 2024-08-20 23:51:02,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5008990.0, ans=0.1 2024-08-20 23:51:09,997 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-20 23:51:11,971 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 27 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-20 23:51:29,136 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 11950, loss[loss=0.1001, beats_loss=0.009753, ecapa_loss=0.000125, whisper_loss=0.08911, over 24178.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01031, ecapa_loss=0.0001394, whisper_loss=0.08947, over 3803608.12 frames. ], batch size: 93, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:51:32,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=5009190.0, ans=0.125 2024-08-20 23:51:37,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=5009190.0, ans=0.0 2024-08-20 23:51:45,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=5009190.0, ans=0.2 2024-08-20 23:52:05,162 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 28 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-20 23:52:06,566 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 15 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-20 23:52:39,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5009490.0, ans=0.125 2024-08-20 23:52:40,549 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.309e+01 2.519e+01 2.758e+01 3.846e+01, threshold=5.038e+01, percent-clipped=0.0 2024-08-20 23:52:44,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=5009490.0, ans=0.2 2024-08-20 23:53:05,574 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 28 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-20 23:53:07,258 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 12000, loss[loss=0.1275, beats_loss=0.008604, ecapa_loss=0.0001234, whisper_loss=0.1176, over 18629.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01029, ecapa_loss=0.0001386, whisper_loss=0.09025, over 3799518.47 frames. ], batch size: 68, lr: 1.79e-03, grad_scale: 2.8823037615171174e+17 2024-08-20 23:53:07,260 INFO [train_multi_KD3.py:1140] (0/4) Computing validation loss 2024-08-20 23:53:44,028 INFO [train_multi_KD3.py:1150] (0/4) Epoch 34, validation on ASR_libri: loss=0.2573, beats_loss=0, ecapa_loss=0.0005075, whisper_loss=0.2522, over 931116.00 frames. 2024-08-20 23:54:09,231 INFO [train_multi_KD3.py:1150] (0/4) Epoch 34, validation on SV_voxceleb1: loss=0.003964, beats_loss=0, ecapa_loss=0.0003964, whisper_loss=0, over 944235.00 frames. 2024-08-20 23:55:45,703 INFO [train_multi_KD3.py:1150] (0/4) Epoch 34, validation on AT_audioset: loss=0.02298, beats_loss=0.02298, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-20 23:55:45,707 INFO [train_multi_KD3.py:1156] (0/4) Maximum memory allocated so far is 32775MB 2024-08-20 23:55:47,924 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.17 vs. limit=15.0 2024-08-20 23:55:55,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5009690.0, ans=0.0 2024-08-20 23:56:05,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=5009790.0, ans=0.2 2024-08-20 23:56:13,990 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.37 vs. limit=10.0 2024-08-20 23:56:23,777 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 30 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-20 23:57:03,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5010090.0, ans=0.125 2024-08-20 23:57:11,218 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 12050, loss[loss=0.1105, beats_loss=0.008429, ecapa_loss=0.0001412, whisper_loss=0.1007, over 22035.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01029, ecapa_loss=0.0001383, whisper_loss=0.09037, over 3772894.81 frames. ], batch size: 88, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-20 23:57:17,508 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.45 vs. limit=10.0 2024-08-20 23:57:24,579 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.22 vs. limit=15.0 2024-08-20 23:57:30,964 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 30 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-20 23:57:42,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5010290.0, ans=0.125 2024-08-20 23:57:57,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=5010390.0, ans=0.125 2024-08-20 23:58:14,124 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.275e+01 2.516e+01 2.859e+01 1.031e+02, threshold=5.032e+01, percent-clipped=2.0 2024-08-20 23:58:29,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5010590.0, ans=0.0 2024-08-20 23:58:29,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=5010590.0, ans=0.0 2024-08-20 23:58:34,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=5010590.0, ans=0.125 2024-08-20 23:58:39,939 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 12100, loss[loss=0.1106, beats_loss=0.009823, ecapa_loss=0.0001465, whisper_loss=0.09928, over 22712.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01043, ecapa_loss=0.0001379, whisper_loss=0.08965, over 3795237.14 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-20 23:58:45,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=5010690.0, ans=0.0 2024-08-20 23:58:58,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=5010790.0, ans=0.0 2024-08-20 23:59:06,086 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-20 23:59:10,075 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-20 23:59:14,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5010890.0, ans=0.125 2024-08-20 23:59:19,477 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-20 23:59:19,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5010890.0, ans=0.125 2024-08-20 23:59:48,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=5010990.0, ans=0.2 2024-08-20 23:59:52,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5011090.0, ans=0.1 2024-08-21 00:00:10,160 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.23 vs. limit=22.5 2024-08-21 00:00:12,309 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 12150, loss[loss=0.1083, beats_loss=0.01085, ecapa_loss=0.000151, whisper_loss=0.09592, over 21562.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01044, ecapa_loss=0.0001382, whisper_loss=0.0905, over 3869400.95 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:00:17,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=5011190.0, ans=0.125 2024-08-21 00:00:55,845 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 00:00:55,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5011390.0, ans=0.125 2024-08-21 00:01:03,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5011390.0, ans=0.0 2024-08-21 00:01:04,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=5011390.0, ans=0.035 2024-08-21 00:01:04,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=5011390.0, ans=0.2 2024-08-21 00:01:10,136 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.92 vs. limit=15.0 2024-08-21 00:01:18,405 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.230e+01 2.539e+01 2.959e+01 2.449e+02, threshold=5.079e+01, percent-clipped=2.0 2024-08-21 00:01:18,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=5011490.0, ans=0.0 2024-08-21 00:01:35,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5011590.0, ans=0.125 2024-08-21 00:01:42,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=5011590.0, ans=0.2 2024-08-21 00:01:45,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=5011690.0, ans=0.125 2024-08-21 00:01:46,154 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 12200, loss[loss=0.102, beats_loss=0.008067, ecapa_loss=0.0001463, whisper_loss=0.0925, over 16163.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01042, ecapa_loss=0.000139, whisper_loss=0.09093, over 3882629.75 frames. ], batch size: 62, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:01:50,060 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 17 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-21 00:02:09,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5011790.0, ans=0.0 2024-08-21 00:02:11,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5011790.0, ans=0.1 2024-08-21 00:02:17,964 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.01 vs. limit=22.5 2024-08-21 00:02:35,312 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 20 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-21 00:02:37,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=5011890.0, ans=0.125 2024-08-21 00:03:02,063 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 16 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-21 00:03:18,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5012090.0, ans=0.125 2024-08-21 00:03:25,943 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 24 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-21 00:03:34,188 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 12250, loss[loss=0.09791, beats_loss=0.008083, ecapa_loss=0.0001219, whisper_loss=0.0886, over 15392.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01039, ecapa_loss=0.0001392, whisper_loss=0.09068, over 3833103.90 frames. ], batch size: 56, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:03:43,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5012190.0, ans=0.125 2024-08-21 00:03:45,153 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 20 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-21 00:03:54,944 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 31 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-21 00:03:57,904 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2024-08-21 00:04:03,449 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 23 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-21 00:04:03,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5012290.0, ans=0.125 2024-08-21 00:04:09,278 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 33 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-21 00:04:12,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=5012390.0, ans=0.125 2024-08-21 00:04:15,027 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 24 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-21 00:04:22,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5012390.0, ans=0.125 2024-08-21 00:04:31,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=5012490.0, ans=0.0 2024-08-21 00:04:37,453 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 27 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-21 00:04:38,661 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.682e+01 2.216e+01 2.475e+01 2.860e+01 1.621e+02, threshold=4.950e+01, percent-clipped=3.0 2024-08-21 00:05:05,777 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 12300, loss[loss=0.1184, beats_loss=0.01054, ecapa_loss=0.0001304, whisper_loss=0.1065, over 15701.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01036, ecapa_loss=0.0001395, whisper_loss=0.0902, over 3830293.33 frames. ], batch size: 61, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:05:34,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5012790.0, ans=0.0 2024-08-21 00:05:37,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=5012790.0, ans=0.0 2024-08-21 00:05:50,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=5012890.0, ans=0.0 2024-08-21 00:05:59,065 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 23 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-21 00:06:07,749 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-21 00:06:22,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=5012990.0, ans=0.0 2024-08-21 00:06:43,074 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 12350, loss[loss=0.09983, beats_loss=0.01054, ecapa_loss=0.0001538, whisper_loss=0.08775, over 16744.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01041, ecapa_loss=0.0001393, whisper_loss=0.09035, over 3833045.20 frames. ], batch size: 71, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:07:03,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=5013290.0, ans=0.125 2024-08-21 00:07:08,069 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0743364468216896, model_norm_threshold=49.50318145751953 2024-08-21 00:07:08,226 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.289e+04, grad_sumsq=4.289e+04, orig_rms_sq=1.000e+00 2024-08-21 00:07:09,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=5013290.0, ans=0.125 2024-08-21 00:07:22,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5013390.0, ans=0.1 2024-08-21 00:07:37,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=5013490.0, ans=0.2 2024-08-21 00:07:41,936 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 19 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-21 00:07:43,932 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 25 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-21 00:07:45,208 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.283e+01 2.548e+01 2.937e+01 6.659e+02, threshold=5.096e+01, percent-clipped=4.0 2024-08-21 00:07:53,619 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-08-21 00:08:10,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5013690.0, ans=0.1 2024-08-21 00:08:12,653 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 12400, loss[loss=0.09525, beats_loss=0.01223, ecapa_loss=0.0001202, whisper_loss=0.08181, over 21287.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01043, ecapa_loss=0.0001388, whisper_loss=0.09023, over 3803317.50 frames. ], batch size: 85, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:08:27,217 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 00:08:27,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5013690.0, ans=0.1 2024-08-21 00:08:27,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5013690.0, ans=0.1 2024-08-21 00:08:33,874 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 18 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-21 00:08:55,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5013890.0, ans=0.125 2024-08-21 00:09:32,388 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 15 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-21 00:09:44,080 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 25 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-21 00:09:47,373 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 12450, loss[loss=0.1154, beats_loss=0.009371, ecapa_loss=0.000134, whisper_loss=0.1047, over 18143.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0104, ecapa_loss=0.0001392, whisper_loss=0.09047, over 3798389.62 frames. ], batch size: 71, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:09:47,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=5014190.0, ans=0.2 2024-08-21 00:09:48,327 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2024-08-21 00:09:51,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=5014190.0, ans=0.0 2024-08-21 00:10:10,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=5014290.0, ans=0.2 2024-08-21 00:10:14,009 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 26 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-21 00:10:24,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5014390.0, ans=0.1 2024-08-21 00:10:51,941 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.269e+01 2.484e+01 2.743e+01 3.672e+01, threshold=4.968e+01, percent-clipped=0.0 2024-08-21 00:10:59,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.39 vs. limit=15.0 2024-08-21 00:11:19,801 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 12500, loss[loss=0.1101, beats_loss=0.008068, ecapa_loss=0.0001506, whisper_loss=0.1005, over 16189.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0104, ecapa_loss=0.0001398, whisper_loss=0.09049, over 3786545.13 frames. ], batch size: 62, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:11:45,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5014790.0, ans=0.125 2024-08-21 00:11:49,940 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.48 vs. limit=22.5 2024-08-21 00:12:06,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=5014890.0, ans=0.2 2024-08-21 00:12:13,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=5014890.0, ans=0.0 2024-08-21 00:12:21,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=5014990.0, ans=0.04949747468305833 2024-08-21 00:12:39,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5015090.0, ans=0.125 2024-08-21 00:12:41,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5015090.0, ans=0.125 2024-08-21 00:12:53,471 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.77 vs. limit=6.0 2024-08-21 00:12:55,670 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 12550, loss[loss=0.1089, beats_loss=0.008076, ecapa_loss=0.0001282, whisper_loss=0.09955, over 18151.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01045, ecapa_loss=0.0001396, whisper_loss=0.08947, over 3772098.14 frames. ], batch size: 68, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:13:07,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=5015190.0, ans=0.05 2024-08-21 00:13:11,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=5015190.0, ans=0.0 2024-08-21 00:13:49,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5015490.0, ans=0.125 2024-08-21 00:14:00,191 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.650e+01 2.267e+01 2.470e+01 2.808e+01 4.015e+01, threshold=4.940e+01, percent-clipped=0.0 2024-08-21 00:14:11,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5015590.0, ans=0.1 2024-08-21 00:14:25,367 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2024-08-21 00:14:28,233 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 12600, loss[loss=0.1058, beats_loss=0.009788, ecapa_loss=0.0001814, whisper_loss=0.09416, over 18006.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01051, ecapa_loss=0.0001395, whisper_loss=0.0897, over 3797773.82 frames. ], batch size: 75, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:14:32,020 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 28 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-21 00:14:41,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=5015690.0, ans=0.125 2024-08-21 00:15:13,445 WARNING [optim.py:496] (0/4) Scaling gradients by 0.00705720903351903, model_norm_threshold=49.39711380004883 2024-08-21 00:15:13,618 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.280e+06, grad_sumsq=7.678e+08, orig_rms_sq=1.078e-02 2024-08-21 00:15:22,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5015990.0, ans=0.1 2024-08-21 00:15:44,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5016090.0, ans=0.1 2024-08-21 00:15:47,454 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=12.0 2024-08-21 00:15:54,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=5016090.0, ans=10.0 2024-08-21 00:16:01,763 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 12650, loss[loss=0.1085, beats_loss=0.008392, ecapa_loss=0.0001624, whisper_loss=0.09846, over 21577.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01051, ecapa_loss=0.000139, whisper_loss=0.09006, over 3829151.86 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:16:16,083 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 21 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-21 00:16:30,144 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.62 vs. limit=22.5 2024-08-21 00:16:38,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5016390.0, ans=0.1 2024-08-21 00:16:40,047 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 24 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-21 00:16:57,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5016490.0, ans=0.0 2024-08-21 00:16:58,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=5016490.0, ans=0.0 2024-08-21 00:17:07,125 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.324e+01 2.529e+01 2.803e+01 7.000e+03, threshold=5.059e+01, percent-clipped=4.0 2024-08-21 00:17:22,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=5016590.0, ans=0.035 2024-08-21 00:17:34,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5016590.0, ans=0.125 2024-08-21 00:17:36,744 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 24 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-21 00:17:38,222 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 12700, loss[loss=0.08953, beats_loss=0.01068, ecapa_loss=0.0001267, whisper_loss=0.07759, over 23259.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01051, ecapa_loss=0.0001392, whisper_loss=0.08921, over 3828573.17 frames. ], batch size: 94, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:17:41,495 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 21 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-21 00:17:47,119 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 32 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-21 00:17:57,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5016790.0, ans=0.125 2024-08-21 00:18:10,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5016790.0, ans=0.125 2024-08-21 00:18:13,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=5016890.0, ans=0.125 2024-08-21 00:18:29,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=5016890.0, ans=0.125 2024-08-21 00:18:39,187 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 14 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-21 00:18:40,812 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 22 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-21 00:18:41,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5016990.0, ans=0.0 2024-08-21 00:18:58,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5017090.0, ans=0.125 2024-08-21 00:19:00,764 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-21 00:19:01,230 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.71 vs. limit=15.0 2024-08-21 00:19:11,151 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 12750, loss[loss=0.08468, beats_loss=0.009457, ecapa_loss=0.0001599, whisper_loss=0.07362, over 18033.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01045, ecapa_loss=0.000138, whisper_loss=0.08982, over 3814586.16 frames. ], batch size: 74, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:19:25,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5017190.0, ans=0.125 2024-08-21 00:19:38,829 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 22 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-21 00:19:53,227 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 29 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-21 00:20:01,703 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 24 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-21 00:20:16,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=5017490.0, ans=0.95 2024-08-21 00:20:18,718 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.32 vs. limit=12.0 2024-08-21 00:20:19,284 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.263e+01 2.506e+01 2.738e+01 4.032e+01, threshold=5.011e+01, percent-clipped=0.0 2024-08-21 00:20:21,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5017490.0, ans=0.125 2024-08-21 00:20:33,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=5017590.0, ans=0.05 2024-08-21 00:20:33,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=5017590.0, ans=0.125 2024-08-21 00:20:46,984 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 12800, loss[loss=0.1138, beats_loss=0.008986, ecapa_loss=0.0001539, whisper_loss=0.1032, over 22056.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0104, ecapa_loss=0.0001389, whisper_loss=0.08987, over 3803092.57 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:20:49,208 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 30 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-21 00:21:04,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5017790.0, ans=0.0 2024-08-21 00:21:09,969 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-21 00:21:18,045 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2024-08-21 00:21:50,644 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 23 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-21 00:22:21,461 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 18 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-21 00:22:25,463 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.73 vs. limit=15.0 2024-08-21 00:22:29,461 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 12850, loss[loss=0.111, beats_loss=0.0102, ecapa_loss=0.0001453, whisper_loss=0.09931, over 20449.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01046, ecapa_loss=0.0001385, whisper_loss=0.08997, over 3835861.68 frames. ], batch size: 81, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:22:43,776 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 13 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-21 00:22:44,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=5018190.0, ans=0.0 2024-08-21 00:23:00,589 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 20 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-21 00:23:10,430 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-21 00:23:30,242 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.37 vs. limit=22.5 2024-08-21 00:23:38,777 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.266e+01 2.519e+01 2.755e+01 3.962e+01, threshold=5.039e+01, percent-clipped=0.0 2024-08-21 00:24:00,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=5018590.0, ans=15.0 2024-08-21 00:24:11,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=5018690.0, ans=0.07 2024-08-21 00:24:12,758 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 12900, loss[loss=0.1095, beats_loss=0.00768, ecapa_loss=0.0001498, whisper_loss=0.1004, over 20881.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01048, ecapa_loss=0.0001383, whisper_loss=0.09014, over 3821540.33 frames. ], batch size: 81, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:24:22,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5018690.0, ans=0.125 2024-08-21 00:24:25,882 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.24 vs. limit=15.0 2024-08-21 00:24:29,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=5018790.0, ans=0.2 2024-08-21 00:24:45,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=5018790.0, ans=15.0 2024-08-21 00:24:55,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5018890.0, ans=0.0 2024-08-21 00:24:59,438 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-21 00:25:18,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5018990.0, ans=0.125 2024-08-21 00:25:30,041 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 17 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-21 00:25:33,156 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 13 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-21 00:25:41,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5019090.0, ans=0.125 2024-08-21 00:25:43,327 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 27 from LS+wenet, 33 from Vox, 29 fro AS 2024-08-21 00:25:48,042 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 12950, loss[loss=0.1162, beats_loss=0.0092, ecapa_loss=0.0001153, whisper_loss=0.1058, over 16362.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01048, ecapa_loss=0.0001378, whisper_loss=0.09029, over 3803926.28 frames. ], batch size: 59, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:25:50,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=5019190.0, ans=0.2 2024-08-21 00:25:53,863 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 28 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-21 00:26:12,653 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 12 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-21 00:26:26,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5019290.0, ans=0.125 2024-08-21 00:26:59,480 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.575e+01 2.232e+01 2.418e+01 2.707e+01 3.600e+01, threshold=4.835e+01, percent-clipped=0.0 2024-08-21 00:27:00,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=5019490.0, ans=0.125 2024-08-21 00:27:03,948 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-21 00:27:13,679 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.280e+05 2024-08-21 00:27:15,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=5019590.0, ans=0.5 2024-08-21 00:27:16,869 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 00:27:33,074 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 13000, loss[loss=0.1122, beats_loss=0.01182, ecapa_loss=0.0001025, whisper_loss=0.09939, over 23407.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01061, ecapa_loss=0.000137, whisper_loss=0.09023, over 3817099.86 frames. ], batch size: 88, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:27:34,710 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 26 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-21 00:28:13,959 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 16 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-21 00:28:22,311 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.98 vs. limit=15.0 2024-08-21 00:28:45,601 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 26 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-21 00:28:55,004 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.32 vs. limit=22.5 2024-08-21 00:28:55,112 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.93 vs. limit=15.0 2024-08-21 00:29:07,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5020090.0, ans=0.1 2024-08-21 00:29:09,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=5020190.0, ans=0.125 2024-08-21 00:29:10,463 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 13050, loss[loss=0.09293, beats_loss=0.01096, ecapa_loss=0.0001426, whisper_loss=0.08054, over 22198.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01063, ecapa_loss=0.0001378, whisper_loss=0.08962, over 3792696.90 frames. ], batch size: 91, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:29:14,705 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.97 vs. limit=15.0 2024-08-21 00:29:35,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5020290.0, ans=0.125 2024-08-21 00:29:53,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5020390.0, ans=0.125 2024-08-21 00:30:03,992 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 22 from LS+wenet, 9 from Vox, 20 fro AS 2024-08-21 00:30:06,208 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 34 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-21 00:30:15,770 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.250e+01 2.442e+01 2.810e+01 6.229e+01, threshold=4.884e+01, percent-clipped=1.0 2024-08-21 00:30:17,467 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 24 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-21 00:30:23,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5020590.0, ans=0.125 2024-08-21 00:30:34,303 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.54 vs. limit=12.0 2024-08-21 00:30:44,114 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 13100, loss[loss=0.116, beats_loss=0.0104, ecapa_loss=0.0001241, whisper_loss=0.1044, over 19089.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01063, ecapa_loss=0.0001384, whisper_loss=0.0889, over 3788541.23 frames. ], batch size: 73, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:30:46,743 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.64 vs. limit=12.0 2024-08-21 00:31:02,296 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-21 00:31:02,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=5020790.0, ans=0.0 2024-08-21 00:31:07,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5020790.0, ans=0.1 2024-08-21 00:31:26,222 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=22.5 2024-08-21 00:31:58,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5020990.0, ans=0.0 2024-08-21 00:32:13,908 WARNING [optim.py:496] (0/4) Scaling gradients by 0.09580767154693604, model_norm_threshold=48.835636138916016 2024-08-21 00:32:14,067 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.692e+04, grad_sumsq=4.692e+04, orig_rms_sq=1.000e+00 2024-08-21 00:32:19,225 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 13150, loss[loss=0.09693, beats_loss=0.007264, ecapa_loss=0.0001509, whisper_loss=0.08815, over 13630.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01064, ecapa_loss=0.0001388, whisper_loss=0.08876, over 3783794.00 frames. ], batch size: 50, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:32:50,015 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-21 00:32:58,532 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 17 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-21 00:33:00,194 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.45 vs. limit=12.0 2024-08-21 00:33:06,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=5021390.0, ans=0.0 2024-08-21 00:33:18,956 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 21 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-21 00:33:22,064 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.398e+01 2.549e+01 2.951e+01 5.097e+02, threshold=5.098e+01, percent-clipped=2.0 2024-08-21 00:33:28,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5021490.0, ans=0.125 2024-08-21 00:33:44,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5021590.0, ans=0.1 2024-08-21 00:33:44,753 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.82 vs. limit=15.0 2024-08-21 00:33:52,576 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 13200, loss[loss=0.09407, beats_loss=0.01179, ecapa_loss=0.0001348, whisper_loss=0.08094, over 15977.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01062, ecapa_loss=0.0001381, whisper_loss=0.0883, over 3788797.49 frames. ], batch size: 64, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:33:56,987 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.45 vs. limit=22.5 2024-08-21 00:34:21,351 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-21 00:34:50,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5021890.0, ans=0.0 2024-08-21 00:34:54,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5021990.0, ans=0.125 2024-08-21 00:35:08,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=5021990.0, ans=0.2 2024-08-21 00:35:16,609 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 27 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-21 00:35:16,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5022090.0, ans=0.0 2024-08-21 00:35:34,092 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 13250, loss[loss=0.1248, beats_loss=0.006755, ecapa_loss=0.0001587, whisper_loss=0.1165, over 14969.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01049, ecapa_loss=0.0001385, whisper_loss=0.08929, over 3770094.34 frames. ], batch size: 58, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:35:38,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5022190.0, ans=0.125 2024-08-21 00:35:57,970 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 00:36:17,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5022390.0, ans=0.125 2024-08-21 00:36:20,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=5022390.0, ans=0.0 2024-08-21 00:36:28,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5022490.0, ans=0.125 2024-08-21 00:36:37,130 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.371e+01 2.560e+01 2.906e+01 3.702e+02, threshold=5.119e+01, percent-clipped=3.0 2024-08-21 00:36:40,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=5022490.0, ans=0.0 2024-08-21 00:36:45,212 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 20 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-21 00:36:45,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=5022490.0, ans=0.125 2024-08-21 00:37:08,373 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 13300, loss[loss=0.09859, beats_loss=0.01063, ecapa_loss=0.0001128, whisper_loss=0.08684, over 22816.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0105, ecapa_loss=0.0001385, whisper_loss=0.08938, over 3754532.48 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:37:12,604 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-21 00:37:21,392 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 18 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-21 00:37:35,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5022790.0, ans=0.0 2024-08-21 00:38:02,675 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.46 vs. limit=6.0 2024-08-21 00:38:10,776 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.56 vs. limit=15.0 2024-08-21 00:38:13,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=5022990.0, ans=0.2 2024-08-21 00:38:18,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5022990.0, ans=0.125 2024-08-21 00:38:43,911 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 13350, loss[loss=0.07972, beats_loss=0.01081, ecapa_loss=0.0001613, whisper_loss=0.06729, over 17476.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01049, ecapa_loss=0.0001393, whisper_loss=0.08923, over 3737708.47 frames. ], batch size: 72, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:39:26,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=5023390.0, ans=0.0 2024-08-21 00:39:39,674 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 15 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-21 00:39:39,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=5023490.0, ans=0.125 2024-08-21 00:39:49,235 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.315e+01 2.502e+01 2.886e+01 3.923e+01, threshold=5.005e+01, percent-clipped=0.0 2024-08-21 00:39:50,102 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.18 vs. limit=15.0 2024-08-21 00:40:05,079 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 17 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-21 00:40:09,776 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 00:40:18,738 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 13400, loss[loss=0.09767, beats_loss=0.009799, ecapa_loss=0.0001278, whisper_loss=0.0866, over 22854.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01049, ecapa_loss=0.0001393, whisper_loss=0.08894, over 3757759.22 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:40:31,269 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-21 00:40:35,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=5023790.0, ans=0.125 2024-08-21 00:40:39,870 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-21 00:40:54,272 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 30 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-21 00:41:11,727 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 21 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-21 00:41:19,174 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 22 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-21 00:41:46,445 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.02 vs. limit=15.0 2024-08-21 00:41:47,586 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 13450, loss[loss=0.1003, beats_loss=0.00962, ecapa_loss=0.0001187, whisper_loss=0.08953, over 21092.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01051, ecapa_loss=0.0001385, whisper_loss=0.08911, over 3794563.70 frames. ], batch size: 82, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:41:48,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5024190.0, ans=0.125 2024-08-21 00:41:49,664 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 18 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-21 00:41:54,463 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 20 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-21 00:42:11,819 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.00 vs. limit=15.0 2024-08-21 00:42:13,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=5024290.0, ans=0.04949747468305833 2024-08-21 00:42:28,419 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 24 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-21 00:42:35,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5024390.0, ans=0.1 2024-08-21 00:42:37,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=5024390.0, ans=0.125 2024-08-21 00:42:44,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=15.0 2024-08-21 00:42:47,285 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 35 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-21 00:42:49,038 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 19 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-21 00:42:51,897 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.706e+01 2.233e+01 2.383e+01 2.688e+01 3.683e+01, threshold=4.765e+01, percent-clipped=0.0 2024-08-21 00:43:19,825 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 15 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-21 00:43:20,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5024590.0, ans=0.125 2024-08-21 00:43:22,788 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 13500, loss[loss=0.1057, beats_loss=0.009458, ecapa_loss=0.0001614, whisper_loss=0.0946, over 22673.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01048, ecapa_loss=0.0001396, whisper_loss=0.08887, over 3773739.17 frames. ], batch size: 95, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:43:27,442 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.29 vs. limit=15.0 2024-08-21 00:43:34,104 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.06 vs. limit=22.5 2024-08-21 00:43:37,889 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-21 00:43:46,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=5024790.0, ans=0.125 2024-08-21 00:43:47,862 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 28 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-21 00:44:10,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=5024890.0, ans=0.0 2024-08-21 00:44:27,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=5024990.0, ans=15.0 2024-08-21 00:44:32,788 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 29 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-21 00:44:40,867 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 20 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-21 00:44:59,201 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 13550, loss[loss=0.1265, beats_loss=0.00723, ecapa_loss=0.0001245, whisper_loss=0.118, over 23206.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01043, ecapa_loss=0.0001378, whisper_loss=0.08967, over 3784834.26 frames. ], batch size: 88, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:45:00,132 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 19 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-21 00:45:59,070 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.19 vs. limit=10.0 2024-08-21 00:46:07,716 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.237e+01 2.540e+01 2.882e+01 4.860e+01, threshold=5.081e+01, percent-clipped=1.0 2024-08-21 00:46:17,650 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 21 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-21 00:46:33,841 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 32 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-21 00:46:34,833 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 13600, loss[loss=0.1073, beats_loss=0.009707, ecapa_loss=0.0001635, whisper_loss=0.09597, over 21578.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01039, ecapa_loss=0.000139, whisper_loss=0.08922, over 3765944.25 frames. ], batch size: 93, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:47:19,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5025890.0, ans=0.1 2024-08-21 00:47:26,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5025890.0, ans=0.0 2024-08-21 00:47:56,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5026090.0, ans=0.1 2024-08-21 00:48:09,541 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 13650, loss[loss=0.1039, beats_loss=0.01123, ecapa_loss=0.0001327, whisper_loss=0.09131, over 22642.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01037, ecapa_loss=0.000139, whisper_loss=0.08913, over 3755845.04 frames. ], batch size: 90, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:48:22,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=5026190.0, ans=0.125 2024-08-21 00:48:29,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=5026290.0, ans=0.125 2024-08-21 00:48:51,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=5026390.0, ans=0.0 2024-08-21 00:48:59,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5026390.0, ans=0.125 2024-08-21 00:49:15,848 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 20 from LS+wenet, 20 from Vox, 53 fro AS 2024-08-21 00:49:22,983 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.304e+01 2.539e+01 2.805e+01 5.664e+01, threshold=5.078e+01, percent-clipped=1.0 2024-08-21 00:49:32,706 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 21 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-21 00:49:49,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5026590.0, ans=0.1 2024-08-21 00:49:52,285 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 13700, loss[loss=0.1044, beats_loss=0.01057, ecapa_loss=0.0001359, whisper_loss=0.09246, over 19423.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01045, ecapa_loss=0.0001387, whisper_loss=0.08846, over 3777841.95 frames. ], batch size: 78, lr: 1.79e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:49:58,576 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 20 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-21 00:50:01,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=5026690.0, ans=0.5 2024-08-21 00:50:03,868 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2024-08-21 00:50:19,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=5026790.0, ans=0.125 2024-08-21 00:50:27,934 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 24 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-21 00:50:38,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5026890.0, ans=0.125 2024-08-21 00:50:45,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=5026890.0, ans=0.2 2024-08-21 00:51:03,830 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 13 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-21 00:51:19,040 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.13 vs. limit=15.0 2024-08-21 00:51:32,321 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 13750, loss[loss=0.1139, beats_loss=0.009597, ecapa_loss=0.0001613, whisper_loss=0.1027, over 19911.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01039, ecapa_loss=0.0001395, whisper_loss=0.08911, over 3783979.42 frames. ], batch size: 81, lr: 1.78e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:51:36,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=5027190.0, ans=0.07 2024-08-21 00:51:36,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5027190.0, ans=0.1 2024-08-21 00:51:38,838 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 23 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-21 00:51:54,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5027290.0, ans=0.125 2024-08-21 00:52:10,509 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 24 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-21 00:52:13,730 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-21 00:52:15,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5027390.0, ans=0.125 2024-08-21 00:52:29,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5027390.0, ans=0.125 2024-08-21 00:52:45,991 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.353e+01 2.699e+01 3.002e+01 5.030e+02, threshold=5.398e+01, percent-clipped=2.0 2024-08-21 00:52:54,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5027490.0, ans=0.125 2024-08-21 00:53:03,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5027590.0, ans=0.1 2024-08-21 00:53:16,610 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 13800, loss[loss=0.08159, beats_loss=0.01212, ecapa_loss=0.0001076, whisper_loss=0.06839, over 15502.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01045, ecapa_loss=0.0001391, whisper_loss=0.08867, over 3793416.73 frames. ], batch size: 62, lr: 1.78e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:53:34,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5027790.0, ans=0.125 2024-08-21 00:53:40,663 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.27 vs. limit=12.0 2024-08-21 00:53:50,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5027790.0, ans=0.0 2024-08-21 00:53:50,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5027790.0, ans=0.1 2024-08-21 00:53:51,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5027790.0, ans=0.125 2024-08-21 00:54:04,725 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.51 vs. limit=15.0 2024-08-21 00:54:21,252 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 28 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-21 00:54:49,142 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 13850, loss[loss=0.1044, beats_loss=0.009337, ecapa_loss=0.0001283, whisper_loss=0.09381, over 19218.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01046, ecapa_loss=0.0001384, whisper_loss=0.08949, over 3793529.73 frames. ], batch size: 75, lr: 1.78e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:55:24,193 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 34 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-21 00:55:24,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=5028290.0, ans=0.125 2024-08-21 00:55:30,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5028390.0, ans=0.125 2024-08-21 00:55:40,592 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 28 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-21 00:55:58,144 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.193e+01 2.425e+01 2.661e+01 8.724e+01, threshold=4.850e+01, percent-clipped=1.0 2024-08-21 00:56:02,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=5028490.0, ans=0.0 2024-08-21 00:56:10,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=5028590.0, ans=0.0 2024-08-21 00:56:13,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=5028590.0, ans=0.125 2024-08-21 00:56:16,802 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 16 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-21 00:56:22,142 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 27 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-21 00:56:25,530 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 13900, loss[loss=0.1225, beats_loss=0.008719, ecapa_loss=0.0001278, whisper_loss=0.1125, over 17298.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0105, ecapa_loss=0.0001378, whisper_loss=0.08934, over 3789107.65 frames. ], batch size: 64, lr: 1.78e-03, grad_scale: 5.764607523034235e+17 2024-08-21 00:56:31,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5028690.0, ans=0.125 2024-08-21 00:56:43,966 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.087e+01 2024-08-21 00:56:51,505 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 19 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-21 00:56:54,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=5028790.0, ans=0.125 2024-08-21 00:57:02,957 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 17 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-21 00:57:11,146 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.46 vs. limit=12.0 2024-08-21 00:57:19,234 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 25 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-21 00:57:28,380 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 22 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-21 00:57:38,715 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 17 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-21 00:57:39,257 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2024-08-21 00:57:54,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=5029090.0, ans=0.0 2024-08-21 00:57:57,546 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 13950, loss[loss=0.08106, beats_loss=0.01311, ecapa_loss=0.0001447, whisper_loss=0.0665, over 21784.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01052, ecapa_loss=0.0001387, whisper_loss=0.08889, over 3789392.08 frames. ], batch size: 91, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 00:58:01,324 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.00 vs. limit=15.0 2024-08-21 00:58:10,228 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 20 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-21 00:58:23,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=5029290.0, ans=0.015 2024-08-21 00:58:23,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=5029290.0, ans=0.2 2024-08-21 00:58:46,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=5029390.0, ans=0.025 2024-08-21 00:58:58,619 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 36 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-21 00:59:04,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=5029490.0, ans=0.2 2024-08-21 00:59:12,613 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.187e+01 2.492e+01 2.707e+01 4.586e+01, threshold=4.984e+01, percent-clipped=0.0 2024-08-21 00:59:13,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5029490.0, ans=0.125 2024-08-21 00:59:17,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=5029490.0, ans=0.125 2024-08-21 00:59:30,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5029590.0, ans=0.125 2024-08-21 00:59:38,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=5029590.0, ans=0.2 2024-08-21 00:59:43,119 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 14000, loss[loss=0.09211, beats_loss=0.009039, ecapa_loss=0.0001378, whisper_loss=0.0817, over 20613.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01053, ecapa_loss=0.0001388, whisper_loss=0.08909, over 3803845.71 frames. ], batch size: 84, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:00:42,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=5029890.0, ans=0.125 2024-08-21 01:00:53,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=5029990.0, ans=0.04949747468305833 2024-08-21 01:00:56,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5029990.0, ans=0.125 2024-08-21 01:00:59,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=5029990.0, ans=0.125 2024-08-21 01:01:09,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=5029990.0, ans=0.0 2024-08-21 01:01:30,782 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 14050, loss[loss=0.1039, beats_loss=0.01025, ecapa_loss=0.0001635, whisper_loss=0.092, over 20791.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01057, ecapa_loss=0.000139, whisper_loss=0.08974, over 3796354.53 frames. ], batch size: 84, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:01:37,413 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.05 vs. limit=6.0 2024-08-21 01:01:54,387 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.86 vs. limit=5.0 2024-08-21 01:01:58,306 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 18 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-21 01:02:09,938 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 12 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-21 01:02:10,534 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.61 vs. limit=12.0 2024-08-21 01:02:12,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5030390.0, ans=0.125 2024-08-21 01:02:42,134 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.263e+01 2.516e+01 2.757e+01 1.194e+02, threshold=5.032e+01, percent-clipped=1.0 2024-08-21 01:02:53,153 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 25 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-21 01:02:53,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=5030590.0, ans=0.0 2024-08-21 01:02:59,698 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-21 01:03:13,522 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 14100, loss[loss=0.1131, beats_loss=0.01069, ecapa_loss=0.000126, whisper_loss=0.1012, over 22149.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01049, ecapa_loss=0.0001386, whisper_loss=0.09009, over 3822610.83 frames. ], batch size: 86, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:03:26,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5030690.0, ans=0.125 2024-08-21 01:03:34,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5030790.0, ans=0.125 2024-08-21 01:03:38,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=5030790.0, ans=0.2 2024-08-21 01:03:42,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5030790.0, ans=0.1 2024-08-21 01:04:05,097 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-21 01:04:32,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=5031090.0, ans=0.05 2024-08-21 01:04:50,691 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 14150, loss[loss=0.11, beats_loss=0.008381, ecapa_loss=0.0001297, whisper_loss=0.1003, over 17556.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01036, ecapa_loss=0.0001397, whisper_loss=0.09082, over 3839321.90 frames. ], batch size: 66, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:05:07,542 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 13 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-21 01:05:34,198 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.24 vs. limit=15.0 2024-08-21 01:05:37,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=5031390.0, ans=0.09899494936611666 2024-08-21 01:05:57,855 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=12.0 2024-08-21 01:06:02,317 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.620e+01 2.285e+01 2.566e+01 2.926e+01 5.021e+02, threshold=5.132e+01, percent-clipped=5.0 2024-08-21 01:06:09,381 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.63 vs. limit=15.0 2024-08-21 01:06:35,450 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 14200, loss[loss=0.08896, beats_loss=0.01005, ecapa_loss=0.0001248, whisper_loss=0.07766, over 13775.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0103, ecapa_loss=0.0001396, whisper_loss=0.09099, over 3816633.81 frames. ], batch size: 55, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:06:41,581 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-21 01:06:47,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=5031690.0, ans=0.0 2024-08-21 01:07:11,076 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.00 vs. limit=22.5 2024-08-21 01:07:33,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5031990.0, ans=0.125 2024-08-21 01:07:38,950 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.90 vs. limit=15.0 2024-08-21 01:07:39,997 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-21 01:07:57,729 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 17 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-21 01:08:01,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5032090.0, ans=0.1 2024-08-21 01:08:02,744 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 17 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-21 01:08:09,672 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 14250, loss[loss=0.09959, beats_loss=0.00976, ecapa_loss=0.0001248, whisper_loss=0.08859, over 15932.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01035, ecapa_loss=0.0001383, whisper_loss=0.09059, over 3784591.92 frames. ], batch size: 61, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:08:30,312 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 19 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-21 01:08:34,209 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 28 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-21 01:08:50,037 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.44 vs. limit=15.0 2024-08-21 01:09:16,631 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.235e+01 2.454e+01 2.805e+01 6.761e+01, threshold=4.908e+01, percent-clipped=2.0 2024-08-21 01:09:20,707 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-21 01:09:26,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5032590.0, ans=0.0 2024-08-21 01:09:28,473 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 23 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-21 01:09:32,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5032590.0, ans=0.0 2024-08-21 01:09:39,727 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.66 vs. limit=15.0 2024-08-21 01:09:42,151 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 14300, loss[loss=0.1058, beats_loss=0.01185, ecapa_loss=0.0001411, whisper_loss=0.09255, over 21978.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01034, ecapa_loss=0.0001385, whisper_loss=0.09065, over 3787560.96 frames. ], batch size: 89, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:09:48,110 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-21 01:09:51,881 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-21 01:10:01,294 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-21 01:10:04,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5032790.0, ans=0.1 2024-08-21 01:10:08,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5032790.0, ans=0.1 2024-08-21 01:10:32,075 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 21 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-21 01:10:38,807 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 27 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-21 01:10:42,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5032990.0, ans=0.0 2024-08-21 01:10:54,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=5032990.0, ans=0.125 2024-08-21 01:11:07,677 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 22 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-21 01:11:15,479 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.93 vs. limit=22.5 2024-08-21 01:11:18,144 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 14350, loss[loss=0.1019, beats_loss=0.0102, ecapa_loss=0.0001396, whisper_loss=0.09026, over 14358.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01027, ecapa_loss=0.0001385, whisper_loss=0.09068, over 3777144.62 frames. ], batch size: 58, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:11:40,234 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 28 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-21 01:11:56,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=5033390.0, ans=0.2 2024-08-21 01:12:07,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5033390.0, ans=0.0 2024-08-21 01:12:16,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5033490.0, ans=0.1 2024-08-21 01:12:19,522 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 16 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-21 01:12:19,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5033490.0, ans=0.1 2024-08-21 01:12:23,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5033490.0, ans=0.1 2024-08-21 01:12:24,438 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.295e+01 2.538e+01 2.825e+01 4.751e+01, threshold=5.075e+01, percent-clipped=0.0 2024-08-21 01:12:49,773 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 14400, loss[loss=0.1023, beats_loss=0.01173, ecapa_loss=0.0001184, whisper_loss=0.08936, over 22825.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01026, ecapa_loss=0.0001392, whisper_loss=0.09081, over 3803939.54 frames. ], batch size: 92, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:13:37,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5033890.0, ans=0.125 2024-08-21 01:13:42,372 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 27 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-21 01:13:44,602 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-21 01:13:53,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=5033990.0, ans=0.125 2024-08-21 01:14:00,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5033990.0, ans=0.125 2024-08-21 01:14:04,462 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.09 vs. limit=15.0 2024-08-21 01:14:10,277 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.29 vs. limit=22.5 2024-08-21 01:14:24,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5034090.0, ans=0.1 2024-08-21 01:14:27,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=5034090.0, ans=0.125 2024-08-21 01:14:32,004 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 14450, loss[loss=0.1009, beats_loss=0.009043, ecapa_loss=0.0001575, whisper_loss=0.09024, over 13600.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01025, ecapa_loss=0.00014, whisper_loss=0.09089, over 3804722.23 frames. ], batch size: 56, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:14:44,985 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.49 vs. limit=22.5 2024-08-21 01:14:58,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=5034290.0, ans=0.0 2024-08-21 01:15:40,188 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.267e+01 2.442e+01 2.819e+01 1.713e+02, threshold=4.884e+01, percent-clipped=1.0 2024-08-21 01:15:46,099 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 19 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-21 01:16:01,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=5034590.0, ans=0.125 2024-08-21 01:16:05,790 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 14500, loss[loss=0.07199, beats_loss=0.01257, ecapa_loss=0.0001324, whisper_loss=0.05809, over 16054.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01036, ecapa_loss=0.0001379, whisper_loss=0.09025, over 3812468.06 frames. ], batch size: 66, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:16:12,427 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 37 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-21 01:16:12,985 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2024-08-21 01:16:27,937 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-21 01:16:55,162 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-21 01:17:13,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=5034990.0, ans=0.125 2024-08-21 01:17:19,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5034990.0, ans=0.125 2024-08-21 01:17:44,082 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 14550, loss[loss=0.0889, beats_loss=0.01082, ecapa_loss=0.0001352, whisper_loss=0.07673, over 22269.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01031, ecapa_loss=0.0001378, whisper_loss=0.09062, over 3807863.82 frames. ], batch size: 92, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:17:49,790 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 17 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-21 01:17:59,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5035190.0, ans=0.1 2024-08-21 01:18:03,896 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2024-08-21 01:18:09,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=5035290.0, ans=0.0 2024-08-21 01:18:18,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5035290.0, ans=0.0 2024-08-21 01:18:54,127 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.744e+01 2.274e+01 2.528e+01 2.801e+01 4.515e+02, threshold=5.056e+01, percent-clipped=2.0 2024-08-21 01:19:15,160 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 19 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-21 01:19:21,681 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 14600, loss[loss=0.1213, beats_loss=0.01056, ecapa_loss=0.0001567, whisper_loss=0.1092, over 12928.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0103, ecapa_loss=0.0001387, whisper_loss=0.09011, over 3778879.14 frames. ], batch size: 55, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:19:44,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5035790.0, ans=0.125 2024-08-21 01:19:44,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5035790.0, ans=0.125 2024-08-21 01:20:00,039 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 30 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-21 01:20:00,785 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2024-08-21 01:20:06,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5035890.0, ans=0.1 2024-08-21 01:20:17,097 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 26 from LS+wenet, 17 from Vox, 50 fro AS 2024-08-21 01:20:56,985 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 14650, loss[loss=0.09984, beats_loss=0.008849, ecapa_loss=0.0001556, whisper_loss=0.08944, over 14614.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01034, ecapa_loss=0.0001374, whisper_loss=0.09034, over 3754219.37 frames. ], batch size: 58, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:22:06,127 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.288e+01 2.569e+01 2.805e+01 8.601e+01, threshold=5.137e+01, percent-clipped=2.0 2024-08-21 01:22:22,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5036590.0, ans=0.125 2024-08-21 01:22:26,482 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 17 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-21 01:22:32,585 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 14700, loss[loss=0.127, beats_loss=0.006831, ecapa_loss=0.0001645, whisper_loss=0.1186, over 23025.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01035, ecapa_loss=0.000138, whisper_loss=0.09071, over 3784551.47 frames. ], batch size: 91, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:22:50,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5036790.0, ans=0.1 2024-08-21 01:23:03,114 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.362e+01 2024-08-21 01:23:03,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5036790.0, ans=0.125 2024-08-21 01:23:10,884 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 28 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-21 01:23:11,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=5036890.0, ans=0.0 2024-08-21 01:23:15,469 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.06 vs. limit=15.0 2024-08-21 01:23:27,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5036990.0, ans=0.125 2024-08-21 01:23:36,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=5036990.0, ans=0.05 2024-08-21 01:23:38,258 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 32 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-21 01:23:44,353 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-21 01:24:01,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=5037090.0, ans=0.05 2024-08-21 01:24:08,876 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 14750, loss[loss=0.1133, beats_loss=0.01079, ecapa_loss=0.0001535, whisper_loss=0.101, over 22286.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0103, ecapa_loss=0.0001392, whisper_loss=0.09048, over 3779481.40 frames. ], batch size: 90, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:24:16,142 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 15 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-21 01:24:49,938 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.59 vs. limit=12.0 2024-08-21 01:25:03,422 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 14 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-21 01:25:10,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5037490.0, ans=0.0 2024-08-21 01:25:13,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5037490.0, ans=0.125 2024-08-21 01:25:15,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=5037490.0, ans=0.125 2024-08-21 01:25:20,570 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.178e+01 2.451e+01 2.819e+01 4.132e+01, threshold=4.902e+01, percent-clipped=0.0 2024-08-21 01:25:33,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5037590.0, ans=0.125 2024-08-21 01:25:43,828 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 22 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-21 01:25:45,592 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-21 01:25:47,089 INFO [train_multi_KD3.py:1117] (0/4) Epoch 34, batch 14800, loss[loss=0.1099, beats_loss=0.009116, ecapa_loss=0.0001507, whisper_loss=0.09932, over 22361.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0103, ecapa_loss=0.0001405, whisper_loss=0.08997, over 3773590.87 frames. ], batch size: 90, lr: 1.78e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:25:53,762 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 30 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-21 01:25:57,831 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-34.pt 2024-08-21 01:26:25,276 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 0, loss[loss=0.1002, beats_loss=0.008672, ecapa_loss=0.0001499, whisper_loss=0.09006, over 22375.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.008672, ecapa_loss=0.0001499, whisper_loss=0.09006, over 22375.00 frames. ], batch size: 91, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:26:25,278 INFO [train_multi_KD3.py:1140] (0/4) Computing validation loss 2024-08-21 01:27:00,137 INFO [train_multi_KD3.py:1150] (0/4) Epoch 35, validation on ASR_libri: loss=0.2538, beats_loss=0, ecapa_loss=0.0005038, whisper_loss=0.2488, over 931116.00 frames. 2024-08-21 01:27:22,765 INFO [train_multi_KD3.py:1150] (0/4) Epoch 35, validation on SV_voxceleb1: loss=0.003936, beats_loss=0, ecapa_loss=0.0003936, whisper_loss=0, over 944235.00 frames. 2024-08-21 01:28:59,219 INFO [train_multi_KD3.py:1150] (0/4) Epoch 35, validation on AT_audioset: loss=0.02305, beats_loss=0.02305, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-21 01:28:59,223 INFO [train_multi_KD3.py:1156] (0/4) Maximum memory allocated so far is 32775MB 2024-08-21 01:29:23,045 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 23 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-21 01:29:31,798 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 18 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-21 01:29:46,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=5037850.0, ans=0.0 2024-08-21 01:29:54,121 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 32 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-21 01:30:02,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5037950.0, ans=0.1 2024-08-21 01:30:28,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=5038050.0, ans=0.125 2024-08-21 01:30:53,771 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 12 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-21 01:31:05,017 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 50, loss[loss=0.09609, beats_loss=0.01017, ecapa_loss=0.0001778, whisper_loss=0.08414, over 19198.00 frames. ], tot_loss[loss=0.09846, beats_loss=0.009468, ecapa_loss=0.0001401, whisper_loss=0.08759, over 843574.16 frames. ], batch size: 80, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:31:05,669 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.55 vs. limit=15.0 2024-08-21 01:31:09,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5038250.0, ans=0.125 2024-08-21 01:31:19,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5038250.0, ans=0.125 2024-08-21 01:31:36,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=5038350.0, ans=0.125 2024-08-21 01:31:47,327 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=6.876e+00 2024-08-21 01:31:54,432 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 22 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-21 01:32:13,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5038450.0, ans=0.125 2024-08-21 01:32:18,759 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.109e+01 2.525e+01 2.864e+01 3.213e+01 4.437e+01, threshold=5.728e+01, percent-clipped=0.0 2024-08-21 01:32:22,844 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-08-21 01:33:13,878 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 100, loss[loss=0.08088, beats_loss=0.009342, ecapa_loss=0.0001462, whisper_loss=0.07008, over 16677.00 frames. ], tot_loss[loss=0.09981, beats_loss=0.009353, ecapa_loss=0.0001384, whisper_loss=0.08907, over 1525103.29 frames. ], batch size: 69, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:33:43,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5038850.0, ans=0.125 2024-08-21 01:33:51,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5038850.0, ans=0.0 2024-08-21 01:33:59,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=5038850.0, ans=0.125 2024-08-21 01:34:15,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=5038950.0, ans=0.0 2024-08-21 01:34:27,814 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 16 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-21 01:35:19,885 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 150, loss[loss=0.0857, beats_loss=0.008189, ecapa_loss=0.0001948, whisper_loss=0.07557, over 18169.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.009217, ecapa_loss=0.00014, whisper_loss=0.08954, over 2028896.15 frames. ], batch size: 76, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:35:26,996 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.25 vs. limit=22.5 2024-08-21 01:35:29,077 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 21 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-21 01:36:25,331 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.962e+01 2.462e+01 2.718e+01 2.997e+01 1.008e+02, threshold=5.437e+01, percent-clipped=1.0 2024-08-21 01:36:42,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5039550.0, ans=0.125 2024-08-21 01:37:07,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5039750.0, ans=0.0 2024-08-21 01:37:08,660 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 200, loss[loss=0.11, beats_loss=0.007832, ecapa_loss=0.0001638, whisper_loss=0.1006, over 15297.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.00948, ecapa_loss=0.0001379, whisper_loss=0.09073, over 2441944.54 frames. ], batch size: 57, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:37:13,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5039750.0, ans=0.125 2024-08-21 01:37:13,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5039750.0, ans=0.1 2024-08-21 01:37:15,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5039750.0, ans=0.1 2024-08-21 01:37:20,742 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=15.0 2024-08-21 01:37:31,186 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 33 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-21 01:37:53,251 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-504000.pt 2024-08-21 01:38:07,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5040050.0, ans=0.125 2024-08-21 01:38:21,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5040050.0, ans=0.125 2024-08-21 01:38:23,548 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.28 vs. limit=15.0 2024-08-21 01:38:39,837 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.17 vs. limit=12.0 2024-08-21 01:38:41,766 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 250, loss[loss=0.07155, beats_loss=0.01133, ecapa_loss=0.0001491, whisper_loss=0.05873, over 14558.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.009721, ecapa_loss=0.0001385, whisper_loss=0.09046, over 2716881.05 frames. ], batch size: 60, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:38:44,653 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.47 vs. limit=15.0 2024-08-21 01:39:05,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=5040350.0, ans=15.0 2024-08-21 01:39:09,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=5040350.0, ans=0.2 2024-08-21 01:39:19,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=5040450.0, ans=0.0 2024-08-21 01:39:32,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=5040450.0, ans=0.0 2024-08-21 01:39:39,867 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.776e+01 2.294e+01 2.516e+01 2.828e+01 4.079e+01, threshold=5.032e+01, percent-clipped=0.0 2024-08-21 01:40:04,667 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 33 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-21 01:40:17,472 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 300, loss[loss=0.1044, beats_loss=0.0113, ecapa_loss=0.0001379, whisper_loss=0.09175, over 21947.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.009883, ecapa_loss=0.0001376, whisper_loss=0.09052, over 2939729.69 frames. ], batch size: 90, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:40:23,109 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.48 vs. limit=22.5 2024-08-21 01:40:52,164 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.86 vs. limit=15.0 2024-08-21 01:40:55,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=5040950.0, ans=0.05 2024-08-21 01:41:51,468 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 350, loss[loss=0.09768, beats_loss=0.01196, ecapa_loss=0.0001087, whisper_loss=0.08463, over 18818.00 frames. ], tot_loss[loss=0.102, beats_loss=0.009912, ecapa_loss=0.0001377, whisper_loss=0.09069, over 3126620.97 frames. ], batch size: 73, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:41:54,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5041250.0, ans=0.125 2024-08-21 01:42:07,192 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 25 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-21 01:42:24,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5041350.0, ans=0.125 2024-08-21 01:42:37,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=5041450.0, ans=0.0 2024-08-21 01:42:43,555 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.323e+01 2.496e+01 2.850e+01 5.461e+01, threshold=4.991e+01, percent-clipped=1.0 2024-08-21 01:42:56,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=5041550.0, ans=0.0 2024-08-21 01:43:00,788 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.78 vs. limit=10.0 2024-08-21 01:43:07,804 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.19 vs. limit=15.0 2024-08-21 01:43:11,076 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.50 vs. limit=15.0 2024-08-21 01:43:17,849 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-21 01:43:19,306 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 400, loss[loss=0.1035, beats_loss=0.01026, ecapa_loss=0.0001254, whisper_loss=0.09202, over 23402.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01008, ecapa_loss=0.000137, whisper_loss=0.08952, over 3271630.13 frames. ], batch size: 88, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:43:21,312 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 22 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-21 01:43:21,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5041750.0, ans=0.1 2024-08-21 01:43:28,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5041750.0, ans=0.125 2024-08-21 01:43:38,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=5041850.0, ans=0.0 2024-08-21 01:43:56,358 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 31 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-21 01:44:17,960 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 19 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-21 01:44:20,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5042050.0, ans=0.125 2024-08-21 01:44:24,133 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.25 vs. limit=12.0 2024-08-21 01:44:36,789 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.01 vs. limit=15.0 2024-08-21 01:44:43,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=5042150.0, ans=0.0 2024-08-21 01:44:50,434 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 450, loss[loss=0.121, beats_loss=0.007517, ecapa_loss=0.0001398, whisper_loss=0.1121, over 20250.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01014, ecapa_loss=0.0001366, whisper_loss=0.08969, over 3382187.96 frames. ], batch size: 75, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:45:04,314 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 25 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-21 01:45:29,464 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.21 vs. limit=22.5 2024-08-21 01:45:37,512 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.78 vs. limit=15.0 2024-08-21 01:45:43,727 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.632e+01 2.267e+01 2.494e+01 2.807e+01 3.587e+01, threshold=4.988e+01, percent-clipped=0.0 2024-08-21 01:45:48,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5042550.0, ans=0.125 2024-08-21 01:45:49,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=5042550.0, ans=0.0 2024-08-21 01:45:59,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5042550.0, ans=0.125 2024-08-21 01:46:20,995 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 500, loss[loss=0.1037, beats_loss=0.01004, ecapa_loss=0.0001396, whisper_loss=0.09226, over 23298.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01015, ecapa_loss=0.0001367, whisper_loss=0.08949, over 3471866.18 frames. ], batch size: 93, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:46:23,420 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.20 vs. limit=15.0 2024-08-21 01:46:24,240 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.724e+05 2024-08-21 01:46:41,679 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 22 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-21 01:46:43,901 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 19 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-21 01:46:50,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5042850.0, ans=0.1 2024-08-21 01:47:06,611 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-21 01:47:31,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=5043050.0, ans=10.0 2024-08-21 01:47:56,963 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 29 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-21 01:47:58,087 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 550, loss[loss=0.1124, beats_loss=0.01037, ecapa_loss=0.0001391, whisper_loss=0.1007, over 23514.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01024, ecapa_loss=0.0001365, whisper_loss=0.0893, over 3537662.04 frames. ], batch size: 92, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:48:08,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.11 vs. limit=15.0 2024-08-21 01:48:20,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=5043350.0, ans=0.04949747468305833 2024-08-21 01:48:44,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=5043450.0, ans=0.125 2024-08-21 01:48:54,338 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.277e+01 2.484e+01 2.909e+01 4.062e+02, threshold=4.967e+01, percent-clipped=2.0 2024-08-21 01:48:55,564 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.58 vs. limit=15.0 2024-08-21 01:49:06,626 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 33 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-21 01:49:11,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=5043650.0, ans=0.2 2024-08-21 01:49:14,439 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 18 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-21 01:49:30,491 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 600, loss[loss=0.1141, beats_loss=0.008907, ecapa_loss=0.0001303, whisper_loss=0.1039, over 23190.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01019, ecapa_loss=0.0001368, whisper_loss=0.08976, over 3578384.86 frames. ], batch size: 88, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:49:45,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=5043750.0, ans=0.0 2024-08-21 01:49:54,191 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 34 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-21 01:49:56,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=5043850.0, ans=0.2 2024-08-21 01:50:02,034 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 01:50:06,283 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 13 from LS+wenet, 24 from Vox, 17 fro AS 2024-08-21 01:50:51,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5044150.0, ans=0.125 2024-08-21 01:51:00,067 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 650, loss[loss=0.1061, beats_loss=0.01127, ecapa_loss=0.0001762, whisper_loss=0.09302, over 17384.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01015, ecapa_loss=0.0001373, whisper_loss=0.08935, over 3615119.30 frames. ], batch size: 76, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:51:01,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5044250.0, ans=0.0 2024-08-21 01:51:07,030 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 33 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-21 01:51:09,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=5044250.0, ans=0.2 2024-08-21 01:51:29,437 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.86 vs. limit=22.5 2024-08-21 01:51:45,250 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 18 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-21 01:51:46,991 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 22 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-21 01:51:48,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5044450.0, ans=0.125 2024-08-21 01:51:50,662 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=9.691e-01 2024-08-21 01:51:51,922 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.231e+01 2.431e+01 2.762e+01 3.963e+01, threshold=4.863e+01, percent-clipped=0.0 2024-08-21 01:52:27,919 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 700, loss[loss=0.07284, beats_loss=0.01054, ecapa_loss=0.0001655, whisper_loss=0.06065, over 15423.00 frames. ], tot_loss[loss=0.09989, beats_loss=0.01028, ecapa_loss=0.0001376, whisper_loss=0.08823, over 3641694.90 frames. ], batch size: 65, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:53:16,078 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 22 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-21 01:53:19,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=5045050.0, ans=0.04949747468305833 2024-08-21 01:53:29,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=5045050.0, ans=0.125 2024-08-21 01:53:31,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=5045050.0, ans=0.0 2024-08-21 01:53:51,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=5045150.0, ans=0.125 2024-08-21 01:53:56,835 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 750, loss[loss=0.08376, beats_loss=0.01344, ecapa_loss=0.0001311, whisper_loss=0.06901, over 20976.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01027, ecapa_loss=0.0001378, whisper_loss=0.0887, over 3681000.92 frames. ], batch size: 87, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:53:57,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=5045250.0, ans=0.2 2024-08-21 01:54:09,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=5045250.0, ans=0.0 2024-08-21 01:54:35,541 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-21 01:54:35,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5045450.0, ans=0.125 2024-08-21 01:54:44,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.39 vs. limit=15.0 2024-08-21 01:54:49,505 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.245e+01 2.497e+01 2.753e+01 9.624e+01, threshold=4.995e+01, percent-clipped=1.0 2024-08-21 01:54:51,373 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 17 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-21 01:55:08,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5045650.0, ans=0.125 2024-08-21 01:55:19,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5045650.0, ans=0.1 2024-08-21 01:55:25,995 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 800, loss[loss=0.07924, beats_loss=0.01059, ecapa_loss=0.0001606, whisper_loss=0.06705, over 20292.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01025, ecapa_loss=0.0001385, whisper_loss=0.0889, over 3708625.67 frames. ], batch size: 85, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:55:29,688 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 21 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-21 01:55:39,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5045750.0, ans=0.125 2024-08-21 01:55:56,208 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 14 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-21 01:56:05,587 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-08-21 01:56:07,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=5045950.0, ans=0.125 2024-08-21 01:56:11,181 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.38 vs. limit=15.0 2024-08-21 01:56:12,975 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.78 vs. limit=15.0 2024-08-21 01:56:29,823 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 31 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-21 01:56:31,554 WARNING [optim.py:496] (0/4) Scaling gradients by 0.057700227946043015, model_norm_threshold=49.945823669433594 2024-08-21 01:56:31,728 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.361e+05, grad_sumsq=1.361e+05, orig_rms_sq=1.000e+00 2024-08-21 01:56:53,844 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 850, loss[loss=0.0969, beats_loss=0.01283, ecapa_loss=7.818e-05, whisper_loss=0.08329, over 15973.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01024, ecapa_loss=0.0001382, whisper_loss=0.08847, over 3706278.35 frames. ], batch size: 59, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:57:05,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5046250.0, ans=0.1 2024-08-21 01:57:21,672 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-21 01:57:46,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=5046450.0, ans=0.09899494936611666 2024-08-21 01:57:48,502 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.265e+01 2.514e+01 2.854e+01 8.656e+02, threshold=5.028e+01, percent-clipped=3.0 2024-08-21 01:58:20,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=5046650.0, ans=0.2 2024-08-21 01:58:25,206 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 900, loss[loss=0.07987, beats_loss=0.01304, ecapa_loss=0.0001214, whisper_loss=0.06562, over 20946.00 frames. ], tot_loss[loss=0.09943, beats_loss=0.0103, ecapa_loss=0.0001378, whisper_loss=0.08776, over 3720326.98 frames. ], batch size: 86, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 01:58:35,232 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.63 vs. limit=15.0 2024-08-21 01:58:39,722 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 17 from LS+wenet, 20 from Vox, 14 fro AS 2024-08-21 01:58:43,670 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.650e-01 2024-08-21 01:58:51,592 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.74 vs. limit=12.0 2024-08-21 01:59:03,397 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 25 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-21 01:59:08,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=5046950.0, ans=0.2 2024-08-21 01:59:20,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=5047050.0, ans=0.04949747468305833 2024-08-21 01:59:27,612 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.45 vs. limit=15.0 2024-08-21 01:59:38,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=5047150.0, ans=0.0 2024-08-21 01:59:43,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=5047150.0, ans=0.125 2024-08-21 01:59:55,767 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 950, loss[loss=0.09448, beats_loss=0.01249, ecapa_loss=0.0001231, whisper_loss=0.08076, over 22036.00 frames. ], tot_loss[loss=0.09994, beats_loss=0.01023, ecapa_loss=0.0001366, whisper_loss=0.08835, over 3721081.38 frames. ], batch size: 89, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 02:00:19,832 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 21 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-21 02:00:32,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5047450.0, ans=0.125 2024-08-21 02:00:36,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=5047450.0, ans=0.0 2024-08-21 02:00:37,552 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 24 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-21 02:00:42,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=15.0 2024-08-21 02:00:46,254 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 22 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-21 02:00:48,156 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.565e+01 2.167e+01 2.360e+01 2.609e+01 1.184e+02, threshold=4.721e+01, percent-clipped=1.0 2024-08-21 02:00:48,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5047550.0, ans=0.0 2024-08-21 02:00:55,831 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 30 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-21 02:01:00,840 INFO [train_multi_KD3.py:845] (0/4) A total of 49 cuts. 17 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-21 02:01:19,202 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.63 vs. limit=15.0 2024-08-21 02:01:20,009 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 33 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-21 02:01:23,452 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 1000, loss[loss=0.07929, beats_loss=0.01154, ecapa_loss=0.0001209, whisper_loss=0.06653, over 19984.00 frames. ], tot_loss[loss=0.09958, beats_loss=0.0102, ecapa_loss=0.000137, whisper_loss=0.08801, over 3687872.88 frames. ], batch size: 80, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 02:01:26,804 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-21 02:01:30,730 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 20 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-21 02:01:51,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=5047850.0, ans=0.0 2024-08-21 02:01:51,823 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.27 vs. limit=12.0 2024-08-21 02:01:56,239 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 14 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-21 02:02:20,152 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2024-08-21 02:02:30,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=5048050.0, ans=0.0 2024-08-21 02:02:52,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=5048250.0, ans=0.015 2024-08-21 02:02:53,582 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 1050, loss[loss=0.08619, beats_loss=0.01079, ecapa_loss=0.0001255, whisper_loss=0.07414, over 22382.00 frames. ], tot_loss[loss=0.09865, beats_loss=0.01024, ecapa_loss=0.0001375, whisper_loss=0.08703, over 3681841.74 frames. ], batch size: 89, lr: 1.76e-03, grad_scale: 2.8823037615171174e+17 2024-08-21 02:02:55,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=5048250.0, ans=0.0 2024-08-21 02:03:03,028 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 37 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-21 02:03:34,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=5048450.0, ans=0.05 2024-08-21 02:03:50,215 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.332e+01 2.561e+01 2.821e+01 8.058e+01, threshold=5.122e+01, percent-clipped=2.0 2024-08-21 02:04:15,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5048650.0, ans=0.125 2024-08-21 02:04:18,678 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2024-08-21 02:04:27,045 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 1100, loss[loss=0.09672, beats_loss=0.0103, ecapa_loss=0.0001086, whisper_loss=0.08534, over 19488.00 frames. ], tot_loss[loss=0.099, beats_loss=0.01025, ecapa_loss=0.0001361, whisper_loss=0.08739, over 3691650.19 frames. ], batch size: 72, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:04:41,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=5048750.0, ans=0.125 2024-08-21 02:04:51,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=5048850.0, ans=0.0 2024-08-21 02:05:02,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=5048950.0, ans=0.125 2024-08-21 02:05:09,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=5048950.0, ans=0.0 2024-08-21 02:05:26,733 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.99 vs. limit=6.0 2024-08-21 02:05:35,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5049050.0, ans=0.125 2024-08-21 02:05:56,356 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 22 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-21 02:05:58,277 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 1150, loss[loss=0.1068, beats_loss=0.009064, ecapa_loss=0.0001257, whisper_loss=0.09645, over 15809.00 frames. ], tot_loss[loss=0.09964, beats_loss=0.01023, ecapa_loss=0.0001364, whisper_loss=0.08805, over 3702685.06 frames. ], batch size: 59, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:06:13,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=5049250.0, ans=0.125 2024-08-21 02:06:50,381 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.348e+01 2.579e+01 2.822e+01 4.118e+01, threshold=5.158e+01, percent-clipped=0.0 2024-08-21 02:06:52,784 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2024-08-21 02:07:25,366 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 1200, loss[loss=0.1193, beats_loss=0.008336, ecapa_loss=0.0001541, whisper_loss=0.1094, over 23905.00 frames. ], tot_loss[loss=0.09912, beats_loss=0.01037, ecapa_loss=0.0001357, whisper_loss=0.08739, over 3697851.43 frames. ], batch size: 94, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:07:45,188 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2024-08-21 02:08:19,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=5050050.0, ans=0.2 2024-08-21 02:08:46,252 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 24 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-21 02:08:52,692 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 1250, loss[loss=0.08995, beats_loss=0.01284, ecapa_loss=0.0001183, whisper_loss=0.07593, over 18696.00 frames. ], tot_loss[loss=0.09932, beats_loss=0.01042, ecapa_loss=0.000134, whisper_loss=0.08756, over 3685969.91 frames. ], batch size: 73, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:09:13,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5050350.0, ans=0.0 2024-08-21 02:09:32,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=5050450.0, ans=0.0 2024-08-21 02:09:46,791 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.192e+01 2.364e+01 2.564e+01 4.097e+01, threshold=4.729e+01, percent-clipped=0.0 2024-08-21 02:10:01,786 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 19 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-21 02:10:20,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5050650.0, ans=0.125 2024-08-21 02:10:20,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=5050650.0, ans=0.2 2024-08-21 02:10:23,332 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 1300, loss[loss=0.07701, beats_loss=0.01117, ecapa_loss=0.0001591, whisper_loss=0.06425, over 17529.00 frames. ], tot_loss[loss=0.099, beats_loss=0.01042, ecapa_loss=0.0001357, whisper_loss=0.08722, over 3707338.42 frames. ], batch size: 73, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:10:42,474 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-21 02:10:56,288 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.43 vs. limit=15.0 2024-08-21 02:11:00,778 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 26 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-21 02:11:08,117 WARNING [optim.py:496] (0/4) Scaling gradients by 0.01577102579176426, model_norm_threshold=47.28926467895508 2024-08-21 02:11:08,290 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.32, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.852e+06, grad_sumsq=2.852e+06, orig_rms_sq=1.000e+00 2024-08-21 02:11:09,773 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 28 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-21 02:11:16,400 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 20 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-21 02:11:23,684 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2024-08-21 02:11:52,008 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 1350, loss[loss=0.107, beats_loss=0.01108, ecapa_loss=0.000132, whisper_loss=0.09456, over 22343.00 frames. ], tot_loss[loss=0.09975, beats_loss=0.01032, ecapa_loss=0.0001361, whisper_loss=0.08807, over 3696249.76 frames. ], batch size: 88, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:12:01,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5051250.0, ans=0.1 2024-08-21 02:12:04,769 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 17 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-21 02:12:40,382 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.54 vs. limit=22.5 2024-08-21 02:12:47,033 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.723e+01 2.226e+01 2.529e+01 2.867e+01 2.998e+03, threshold=5.057e+01, percent-clipped=1.0 2024-08-21 02:13:21,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5051750.0, ans=0.125 2024-08-21 02:13:23,047 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 1400, loss[loss=0.09347, beats_loss=0.01041, ecapa_loss=0.0001054, whisper_loss=0.08201, over 16879.00 frames. ], tot_loss[loss=0.09957, beats_loss=0.01041, ecapa_loss=0.0001348, whisper_loss=0.0878, over 3693841.48 frames. ], batch size: 66, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:13:45,855 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-21 02:13:48,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=5051850.0, ans=0.125 2024-08-21 02:14:01,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=5051950.0, ans=0.2 2024-08-21 02:14:04,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=5051950.0, ans=0.2 2024-08-21 02:14:37,726 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 19 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-21 02:14:54,645 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 02:14:55,677 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 1450, loss[loss=0.09316, beats_loss=0.01189, ecapa_loss=0.0001443, whisper_loss=0.07983, over 21307.00 frames. ], tot_loss[loss=0.09913, beats_loss=0.01035, ecapa_loss=0.0001358, whisper_loss=0.08742, over 3679145.58 frames. ], batch size: 87, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:14:57,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5052250.0, ans=0.125 2024-08-21 02:15:10,139 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 20 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-21 02:15:28,330 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.13 vs. limit=10.0 2024-08-21 02:15:31,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5052450.0, ans=0.125 2024-08-21 02:15:33,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5052450.0, ans=0.1 2024-08-21 02:15:38,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5052450.0, ans=0.125 2024-08-21 02:15:49,355 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.283e+01 2.598e+01 2.871e+01 4.818e+01, threshold=5.197e+01, percent-clipped=0.0 2024-08-21 02:15:49,568 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 24 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-21 02:16:09,876 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.95 vs. limit=15.0 2024-08-21 02:16:11,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=5052550.0, ans=0.2 2024-08-21 02:16:17,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=5052550.0, ans=0.95 2024-08-21 02:16:20,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5052650.0, ans=0.125 2024-08-21 02:16:39,071 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 25 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-21 02:16:42,135 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 1500, loss[loss=0.1135, beats_loss=0.008436, ecapa_loss=0.0001202, whisper_loss=0.1039, over 14063.00 frames. ], tot_loss[loss=0.09923, beats_loss=0.01043, ecapa_loss=0.0001352, whisper_loss=0.08745, over 3696847.32 frames. ], batch size: 51, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:16:47,969 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 23 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-21 02:17:08,535 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 28 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-21 02:17:10,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5052850.0, ans=0.125 2024-08-21 02:17:11,937 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 29 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-21 02:17:28,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=5052950.0, ans=0.1 2024-08-21 02:17:44,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=5053050.0, ans=0.95 2024-08-21 02:18:15,472 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 21 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-21 02:18:16,708 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 1550, loss[loss=0.07716, beats_loss=0.01166, ecapa_loss=0.0001473, whisper_loss=0.06402, over 19916.00 frames. ], tot_loss[loss=0.09856, beats_loss=0.01049, ecapa_loss=0.0001354, whisper_loss=0.08672, over 3717953.61 frames. ], batch size: 82, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:18:39,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=5053350.0, ans=0.04949747468305833 2024-08-21 02:19:07,943 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-21 02:19:13,375 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.638e+01 2.172e+01 2.388e+01 2.761e+01 1.037e+02, threshold=4.777e+01, percent-clipped=1.0 2024-08-21 02:19:23,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=5053550.0, ans=0.2 2024-08-21 02:19:50,434 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 1600, loss[loss=0.08883, beats_loss=0.01189, ecapa_loss=0.0001677, whisper_loss=0.07526, over 17267.00 frames. ], tot_loss[loss=0.09908, beats_loss=0.01045, ecapa_loss=0.0001345, whisper_loss=0.08729, over 3696781.51 frames. ], batch size: 72, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:19:52,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5053750.0, ans=0.0 2024-08-21 02:19:53,412 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.82 vs. limit=5.0 2024-08-21 02:19:57,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5053750.0, ans=0.1 2024-08-21 02:20:04,567 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 16 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-21 02:20:08,866 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.01 vs. limit=10.0 2024-08-21 02:20:10,863 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.06 vs. limit=15.0 2024-08-21 02:20:13,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5053850.0, ans=0.1 2024-08-21 02:20:34,059 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 26 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-21 02:20:36,462 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.68 vs. limit=22.5 2024-08-21 02:20:43,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5054050.0, ans=0.1 2024-08-21 02:20:43,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5054050.0, ans=0.125 2024-08-21 02:20:52,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=5054050.0, ans=15.0 2024-08-21 02:20:53,743 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 17 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-21 02:21:08,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=5054150.0, ans=0.0 2024-08-21 02:21:08,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=5054150.0, ans=0.0 2024-08-21 02:21:11,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5054150.0, ans=0.1 2024-08-21 02:21:17,950 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 29 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-21 02:21:20,729 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 1650, loss[loss=0.1026, beats_loss=0.01028, ecapa_loss=0.0001581, whisper_loss=0.0907, over 21810.00 frames. ], tot_loss[loss=0.09995, beats_loss=0.01036, ecapa_loss=0.0001344, whisper_loss=0.08825, over 3722614.78 frames. ], batch size: 91, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:21:23,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=5054250.0, ans=0.07 2024-08-21 02:21:40,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=5054350.0, ans=0.0 2024-08-21 02:21:44,396 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 17 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-21 02:21:46,195 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 33 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-21 02:21:48,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=5054350.0, ans=0.125 2024-08-21 02:22:05,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5054450.0, ans=0.1 2024-08-21 02:22:13,650 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.45 vs. limit=12.0 2024-08-21 02:22:14,943 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.709e+01 2.289e+01 2.512e+01 2.829e+01 4.001e+01, threshold=5.024e+01, percent-clipped=0.0 2024-08-21 02:22:20,731 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 21 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-21 02:22:51,665 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 1700, loss[loss=0.1078, beats_loss=0.007581, ecapa_loss=0.0001371, whisper_loss=0.09881, over 17290.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01033, ecapa_loss=0.0001348, whisper_loss=0.08851, over 3721745.31 frames. ], batch size: 67, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:23:06,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5054750.0, ans=0.1 2024-08-21 02:23:06,989 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 23 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-21 02:23:18,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5054850.0, ans=0.125 2024-08-21 02:23:32,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=5054950.0, ans=0.125 2024-08-21 02:23:38,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=5054950.0, ans=0.2 2024-08-21 02:23:42,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=5054950.0, ans=0.09899494936611666 2024-08-21 02:23:45,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5055050.0, ans=0.1 2024-08-21 02:23:54,046 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 12 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-21 02:24:09,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5055150.0, ans=0.0 2024-08-21 02:24:23,802 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 1750, loss[loss=0.08588, beats_loss=0.009704, ecapa_loss=0.0001493, whisper_loss=0.07468, over 13217.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01026, ecapa_loss=0.0001349, whisper_loss=0.08863, over 3718443.83 frames. ], batch size: 52, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:24:29,874 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.513e-02 2024-08-21 02:24:41,720 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.141e+00 2024-08-21 02:24:49,182 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.76 vs. limit=10.0 2024-08-21 02:25:05,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=5055450.0, ans=0.04949747468305833 2024-08-21 02:25:19,027 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.242e+01 2.435e+01 2.822e+01 2.727e+02, threshold=4.871e+01, percent-clipped=1.0 2024-08-21 02:25:23,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5055550.0, ans=0.125 2024-08-21 02:25:37,485 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 28 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-21 02:25:41,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=5055650.0, ans=0.2 2024-08-21 02:25:48,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5055650.0, ans=0.125 2024-08-21 02:25:55,139 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 1800, loss[loss=0.1078, beats_loss=0.01003, ecapa_loss=0.0001115, whisper_loss=0.09667, over 22183.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.0102, ecapa_loss=0.0001349, whisper_loss=0.08913, over 3720343.35 frames. ], batch size: 83, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:26:10,101 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 27 from LS+wenet, 15 from Vox, 51 fro AS 2024-08-21 02:26:46,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5055950.0, ans=0.125 2024-08-21 02:26:49,476 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 18 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-21 02:26:58,931 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 14 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-21 02:27:14,482 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 24 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-21 02:27:23,315 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 38 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-21 02:27:25,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5056250.0, ans=0.125 2024-08-21 02:27:26,623 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 1850, loss[loss=0.1267, beats_loss=0.00843, ecapa_loss=0.0001326, whisper_loss=0.117, over 21460.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01024, ecapa_loss=0.0001346, whisper_loss=0.08976, over 3759793.98 frames. ], batch size: 80, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:27:44,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5056350.0, ans=0.1 2024-08-21 02:27:51,514 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 20 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-21 02:27:51,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=5056350.0, ans=0.2 2024-08-21 02:27:53,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=5056350.0, ans=0.125 2024-08-21 02:28:21,039 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.279e+01 2.495e+01 2.833e+01 4.581e+01, threshold=4.990e+01, percent-clipped=0.0 2024-08-21 02:28:30,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=5056550.0, ans=0.125 2024-08-21 02:28:43,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=5056650.0, ans=0.0 2024-08-21 02:28:55,480 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.57 vs. limit=15.0 2024-08-21 02:28:58,298 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 1900, loss[loss=0.07325, beats_loss=0.01112, ecapa_loss=0.0001303, whisper_loss=0.06083, over 18875.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01024, ecapa_loss=0.0001335, whisper_loss=0.08929, over 3733248.76 frames. ], batch size: 79, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:29:14,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=5056750.0, ans=0.0 2024-08-21 02:29:29,528 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 18 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-21 02:29:34,041 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-08-21 02:29:55,920 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-21 02:30:03,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=5057050.0, ans=0.2 2024-08-21 02:30:04,874 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 16 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-21 02:30:14,602 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2024-08-21 02:30:21,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5057150.0, ans=0.125 2024-08-21 02:30:29,924 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 1950, loss[loss=0.1046, beats_loss=0.009901, ecapa_loss=0.0001041, whisper_loss=0.09363, over 14044.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01019, ecapa_loss=0.0001341, whisper_loss=0.08958, over 3720933.95 frames. ], batch size: 51, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:30:51,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5057350.0, ans=0.125 2024-08-21 02:31:04,428 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 23 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-21 02:31:10,607 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.61 vs. limit=22.5 2024-08-21 02:31:17,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=5057450.0, ans=0.2 2024-08-21 02:31:25,607 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.215e+01 2.451e+01 2.695e+01 5.295e+01, threshold=4.901e+01, percent-clipped=1.0 2024-08-21 02:31:35,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=5057550.0, ans=0.04949747468305833 2024-08-21 02:31:52,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=5057650.0, ans=0.0 2024-08-21 02:31:52,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5057650.0, ans=0.0 2024-08-21 02:32:00,973 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 19 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-21 02:32:02,174 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 2000, loss[loss=0.08749, beats_loss=0.01176, ecapa_loss=0.000123, whisper_loss=0.0745, over 18347.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01023, ecapa_loss=0.000134, whisper_loss=0.08895, over 3695288.93 frames. ], batch size: 74, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:32:11,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=5057750.0, ans=0.125 2024-08-21 02:32:14,173 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.61 vs. limit=15.0 2024-08-21 02:32:26,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5057850.0, ans=0.125 2024-08-21 02:32:54,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5057950.0, ans=0.1 2024-08-21 02:32:54,513 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.28 vs. limit=22.5 2024-08-21 02:33:34,314 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 2050, loss[loss=0.1061, beats_loss=0.009351, ecapa_loss=0.0001679, whisper_loss=0.09507, over 22306.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01026, ecapa_loss=0.0001335, whisper_loss=0.08882, over 3699356.62 frames. ], batch size: 93, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:34:11,119 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.99 vs. limit=10.0 2024-08-21 02:34:12,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=5058450.0, ans=0.125 2024-08-21 02:34:14,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=5058450.0, ans=0.125 2024-08-21 02:34:27,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5058450.0, ans=0.1 2024-08-21 02:34:30,319 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.712e+01 2.290e+01 2.506e+01 2.810e+01 1.281e+02, threshold=5.013e+01, percent-clipped=3.0 2024-08-21 02:34:49,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5058650.0, ans=0.1 2024-08-21 02:35:02,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5058650.0, ans=0.0 2024-08-21 02:35:06,278 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 2100, loss[loss=0.07817, beats_loss=0.01259, ecapa_loss=0.0001272, whisper_loss=0.06431, over 13851.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01026, ecapa_loss=0.0001339, whisper_loss=0.08873, over 3693594.48 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:35:15,838 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 30 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-21 02:35:16,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=5058750.0, ans=0.2 2024-08-21 02:35:46,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5058950.0, ans=0.125 2024-08-21 02:35:46,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5058950.0, ans=0.125 2024-08-21 02:35:57,844 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 02:36:08,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=5059050.0, ans=0.0 2024-08-21 02:36:10,584 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.155e-01 2024-08-21 02:36:28,085 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 38 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-21 02:36:37,061 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 2150, loss[loss=0.1116, beats_loss=0.007198, ecapa_loss=0.0001567, whisper_loss=0.1028, over 14843.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01032, ecapa_loss=0.0001322, whisper_loss=0.08904, over 3707448.04 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:36:55,570 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 19 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-21 02:36:56,419 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.69 vs. limit=15.0 2024-08-21 02:36:59,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=5059350.0, ans=0.5 2024-08-21 02:37:35,259 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.212e+01 2.472e+01 2.786e+01 4.629e+01, threshold=4.943e+01, percent-clipped=0.0 2024-08-21 02:38:04,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5059650.0, ans=0.125 2024-08-21 02:38:11,741 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 13 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-21 02:38:13,179 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 2200, loss[loss=0.07124, beats_loss=0.01248, ecapa_loss=0.0001027, whisper_loss=0.05774, over 18021.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01038, ecapa_loss=0.0001328, whisper_loss=0.08847, over 3729669.96 frames. ], batch size: 69, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:38:16,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=5059750.0, ans=0.09899494936611666 2024-08-21 02:38:35,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5059850.0, ans=0.0 2024-08-21 02:38:36,258 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.74 vs. limit=15.0 2024-08-21 02:38:50,703 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.08 vs. limit=22.5 2024-08-21 02:39:04,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5059950.0, ans=0.125 2024-08-21 02:39:06,343 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 26 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-21 02:39:21,646 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 23 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-21 02:39:43,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5060250.0, ans=0.125 2024-08-21 02:39:45,177 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 2250, loss[loss=0.1031, beats_loss=0.012, ecapa_loss=0.0001046, whisper_loss=0.09009, over 22682.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01037, ecapa_loss=0.000133, whisper_loss=0.08899, over 3723957.65 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:39:53,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=5060250.0, ans=0.0 2024-08-21 02:40:11,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=5060350.0, ans=0.0 2024-08-21 02:40:39,322 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.650e+01 2.265e+01 2.538e+01 2.956e+01 4.238e+01, threshold=5.075e+01, percent-clipped=0.0 2024-08-21 02:40:52,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=5060550.0, ans=0.07 2024-08-21 02:41:14,831 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 2300, loss[loss=0.09646, beats_loss=0.0115, ecapa_loss=0.0001482, whisper_loss=0.08348, over 20048.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01038, ecapa_loss=0.0001335, whisper_loss=0.08924, over 3725734.05 frames. ], batch size: 81, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:41:23,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=5060750.0, ans=0.125 2024-08-21 02:41:27,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=5060750.0, ans=0.125 2024-08-21 02:41:32,691 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 14 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-21 02:41:53,312 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 23 from LS+wenet, 33 from Vox, 36 fro AS 2024-08-21 02:42:25,540 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 21 from LS+wenet, 21 from Vox, 50 fro AS 2024-08-21 02:42:27,291 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-21 02:42:29,442 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 22 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-21 02:42:48,625 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 2350, loss[loss=0.1019, beats_loss=0.00884, ecapa_loss=0.0001453, whisper_loss=0.09163, over 19293.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01034, ecapa_loss=0.0001356, whisper_loss=0.08993, over 3766755.74 frames. ], batch size: 78, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:42:51,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=5061250.0, ans=0.125 2024-08-21 02:43:50,628 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.279e+01 2.504e+01 2.806e+01 9.902e+01, threshold=5.007e+01, percent-clipped=2.0 2024-08-21 02:43:55,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5061550.0, ans=0.125 2024-08-21 02:43:59,660 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.911e-02 2024-08-21 02:44:03,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5061550.0, ans=0.0 2024-08-21 02:44:05,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=5061550.0, ans=0.0 2024-08-21 02:44:05,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=5061550.0, ans=0.025 2024-08-21 02:44:17,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5061650.0, ans=0.125 2024-08-21 02:44:24,306 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.80 vs. limit=12.0 2024-08-21 02:44:31,004 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 2400, loss[loss=0.1044, beats_loss=0.01106, ecapa_loss=0.0001446, whisper_loss=0.09191, over 22158.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01031, ecapa_loss=0.0001365, whisper_loss=0.09078, over 3781142.51 frames. ], batch size: 92, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:44:48,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5061750.0, ans=0.125 2024-08-21 02:44:54,982 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-21 02:45:10,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=5061850.0, ans=0.0 2024-08-21 02:45:40,031 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 24 from LS+wenet, 11 from Vox, 20 fro AS 2024-08-21 02:45:47,582 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 22 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-21 02:46:23,332 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.53 vs. limit=15.0 2024-08-21 02:46:23,535 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 2450, loss[loss=0.07271, beats_loss=0.01106, ecapa_loss=0.0001262, whisper_loss=0.06039, over 17873.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01036, ecapa_loss=0.0001356, whisper_loss=0.09054, over 3776570.93 frames. ], batch size: 72, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:46:49,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=5062350.0, ans=0.05 2024-08-21 02:47:02,721 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-21 02:47:05,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=5062350.0, ans=0.0 2024-08-21 02:47:34,860 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.316e+01 2.526e+01 2.772e+01 3.117e+02, threshold=5.053e+01, percent-clipped=1.0 2024-08-21 02:47:38,849 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 16 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-21 02:47:59,262 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.76 vs. limit=15.0 2024-08-21 02:48:08,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=5062650.0, ans=0.125 2024-08-21 02:48:24,286 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 2500, loss[loss=0.08822, beats_loss=0.01304, ecapa_loss=0.000125, whisper_loss=0.07394, over 16921.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01039, ecapa_loss=0.0001364, whisper_loss=0.08973, over 3800510.52 frames. ], batch size: 66, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:48:26,164 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 02:48:31,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5062750.0, ans=0.125 2024-08-21 02:49:00,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5062850.0, ans=0.125 2024-08-21 02:49:09,338 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.09 vs. limit=12.0 2024-08-21 02:49:16,287 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.34 vs. limit=10.0 2024-08-21 02:49:25,645 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 14 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-21 02:49:33,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=5063050.0, ans=0.2 2024-08-21 02:49:47,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5063050.0, ans=0.0 2024-08-21 02:49:49,376 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 25 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-21 02:50:00,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5063150.0, ans=0.125 2024-08-21 02:50:01,619 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 19 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-21 02:50:09,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=5063150.0, ans=0.04949747468305833 2024-08-21 02:50:13,086 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 2550, loss[loss=0.08662, beats_loss=0.007431, ecapa_loss=0.0001508, whisper_loss=0.07768, over 18704.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01032, ecapa_loss=0.000137, whisper_loss=0.09003, over 3794093.01 frames. ], batch size: 74, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:50:24,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=5063250.0, ans=0.125 2024-08-21 02:50:26,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=5063250.0, ans=0.2 2024-08-21 02:50:26,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=5063250.0, ans=0.95 2024-08-21 02:50:30,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5063250.0, ans=0.0 2024-08-21 02:50:47,370 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 26 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-21 02:50:59,613 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.43 vs. limit=22.5 2024-08-21 02:51:19,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5063550.0, ans=0.125 2024-08-21 02:51:20,086 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.281e+01 2.497e+01 2.831e+01 4.835e+01, threshold=4.995e+01, percent-clipped=0.0 2024-08-21 02:51:34,104 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-08-21 02:52:09,140 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 2600, loss[loss=0.1111, beats_loss=0.009968, ecapa_loss=0.000138, whisper_loss=0.09973, over 14532.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01036, ecapa_loss=0.0001368, whisper_loss=0.09019, over 3780287.93 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:53:25,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5063950.0, ans=0.1 2024-08-21 02:53:42,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5064050.0, ans=0.0 2024-08-21 02:54:05,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5064150.0, ans=0.125 2024-08-21 02:54:07,818 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 17 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-21 02:54:20,666 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 21 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-21 02:54:21,848 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 2650, loss[loss=0.08849, beats_loss=0.01215, ecapa_loss=0.000178, whisper_loss=0.07457, over 16730.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01033, ecapa_loss=0.0001368, whisper_loss=0.09014, over 3776470.29 frames. ], batch size: 73, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:54:28,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=5064250.0, ans=0.04949747468305833 2024-08-21 02:54:29,422 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 24 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-21 02:54:53,857 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 20 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-21 02:54:56,608 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 28 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-21 02:55:10,387 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 33 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-21 02:55:24,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=5064450.0, ans=0.0 2024-08-21 02:55:41,675 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.703e+01 2.306e+01 2.544e+01 2.939e+01 3.967e+01, threshold=5.087e+01, percent-clipped=0.0 2024-08-21 02:55:45,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5064550.0, ans=0.0 2024-08-21 02:55:50,991 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 20 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-21 02:56:05,742 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.21 vs. limit=12.0 2024-08-21 02:56:22,967 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 20 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-21 02:56:32,253 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 2700, loss[loss=0.1196, beats_loss=0.009403, ecapa_loss=0.0001145, whisper_loss=0.109, over 17658.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01036, ecapa_loss=0.000136, whisper_loss=0.08986, over 3786996.44 frames. ], batch size: 66, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:56:44,984 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.51 vs. limit=15.0 2024-08-21 02:56:59,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5064850.0, ans=0.1 2024-08-21 02:57:33,640 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.85 vs. limit=15.0 2024-08-21 02:57:44,256 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.69 vs. limit=10.0 2024-08-21 02:57:46,424 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 18 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-21 02:57:48,238 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 15 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-21 02:58:38,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=5065150.0, ans=0.2 2024-08-21 02:58:38,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=5065150.0, ans=0.0 2024-08-21 02:58:42,141 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 2750, loss[loss=0.0961, beats_loss=0.0114, ecapa_loss=0.0001267, whisper_loss=0.08343, over 18283.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01044, ecapa_loss=0.0001359, whisper_loss=0.0891, over 3784621.62 frames. ], batch size: 73, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 02:59:09,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5065350.0, ans=0.1 2024-08-21 02:59:33,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=5065450.0, ans=0.2 2024-08-21 02:59:59,986 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.389e+01 2.549e+01 2.769e+01 6.929e+01, threshold=5.098e+01, percent-clipped=1.0 2024-08-21 03:00:42,358 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 27 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-21 03:00:45,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5065650.0, ans=0.125 2024-08-21 03:00:47,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=5065750.0, ans=0.2 2024-08-21 03:00:48,659 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 2800, loss[loss=0.111, beats_loss=0.008792, ecapa_loss=0.0001563, whisper_loss=0.1006, over 22714.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01048, ecapa_loss=0.0001351, whisper_loss=0.08906, over 3768650.71 frames. ], batch size: 93, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:01:13,837 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-08-21 03:01:16,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=5065850.0, ans=0.0 2024-08-21 03:02:20,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=5066050.0, ans=0.0 2024-08-21 03:02:56,665 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 2850, loss[loss=0.08762, beats_loss=0.01104, ecapa_loss=0.0001473, whisper_loss=0.07511, over 22249.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01041, ecapa_loss=0.0001351, whisper_loss=0.08981, over 3779455.70 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:02:58,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5066250.0, ans=0.0 2024-08-21 03:03:11,624 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.12 vs. limit=22.5 2024-08-21 03:03:40,931 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=15.0 2024-08-21 03:04:00,871 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 21 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-21 03:04:14,242 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.291e+01 2.516e+01 2.868e+01 4.695e+01, threshold=5.032e+01, percent-clipped=0.0 2024-08-21 03:04:29,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5066550.0, ans=0.125 2024-08-21 03:04:50,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=5066650.0, ans=0.2 2024-08-21 03:05:00,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5066650.0, ans=0.0 2024-08-21 03:05:07,356 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 2900, loss[loss=0.1002, beats_loss=0.01211, ecapa_loss=0.0001289, whisper_loss=0.08678, over 22666.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01037, ecapa_loss=0.000135, whisper_loss=0.09077, over 3783431.62 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:05:11,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=5066750.0, ans=0.1 2024-08-21 03:05:14,436 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 16 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-21 03:05:16,937 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 17 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-21 03:05:20,141 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 19 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-21 03:05:50,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=5066850.0, ans=0.125 2024-08-21 03:06:37,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5067050.0, ans=0.125 2024-08-21 03:06:40,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5067050.0, ans=0.1 2024-08-21 03:07:10,187 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 2950, loss[loss=0.1076, beats_loss=0.008825, ecapa_loss=0.0001894, whisper_loss=0.09687, over 18896.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01037, ecapa_loss=0.0001358, whisper_loss=0.09015, over 3749985.94 frames. ], batch size: 81, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:07:32,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5067250.0, ans=0.125 2024-08-21 03:07:51,935 WARNING [optim.py:496] (0/4) Scaling gradients by 0.03678512200713158, model_norm_threshold=50.32452392578125 2024-08-21 03:07:52,106 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.620e+05, grad_sumsq=7.962e+04, orig_rms_sq=3.290e+00 2024-08-21 03:07:52,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=5067350.0, ans=0.05 2024-08-21 03:08:19,734 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.280e+01 2.501e+01 2.875e+01 1.368e+03, threshold=5.003e+01, percent-clipped=1.0 2024-08-21 03:08:30,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5067550.0, ans=0.125 2024-08-21 03:08:45,337 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-08-21 03:08:47,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=5067650.0, ans=10.0 2024-08-21 03:08:50,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5067650.0, ans=0.125 2024-08-21 03:09:02,343 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 3000, loss[loss=0.07947, beats_loss=0.01017, ecapa_loss=0.0001434, whisper_loss=0.06787, over 14955.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01036, ecapa_loss=0.0001368, whisper_loss=0.08998, over 3776009.52 frames. ], batch size: 59, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:09:02,344 INFO [train_multi_KD3.py:1140] (0/4) Computing validation loss 2024-08-21 03:09:39,374 INFO [train_multi_KD3.py:1150] (0/4) Epoch 35, validation on ASR_libri: loss=0.2546, beats_loss=0, ecapa_loss=0.0005038, whisper_loss=0.2496, over 931116.00 frames. 2024-08-21 03:09:48,568 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.7673, 2.0211, 2.0568, 2.2635, 2.7883, 2.5217, 2.4022, 2.2944], device='cuda:0') 2024-08-21 03:10:01,785 INFO [train_multi_KD3.py:1150] (0/4) Epoch 35, validation on SV_voxceleb1: loss=0.003899, beats_loss=0, ecapa_loss=0.0003899, whisper_loss=0, over 944235.00 frames. 2024-08-21 03:11:41,891 INFO [train_multi_KD3.py:1150] (0/4) Epoch 35, validation on AT_audioset: loss=0.02313, beats_loss=0.02313, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-21 03:11:41,895 INFO [train_multi_KD3.py:1156] (0/4) Maximum memory allocated so far is 32775MB 2024-08-21 03:11:42,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=5067750.0, ans=0.125 2024-08-21 03:11:43,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=5067750.0, ans=0.015 2024-08-21 03:11:45,111 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 25 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-21 03:12:16,657 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 16 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-21 03:12:28,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5067950.0, ans=0.0 2024-08-21 03:12:33,680 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.52 vs. limit=22.5 2024-08-21 03:12:42,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=5068050.0, ans=0.09899494936611666 2024-08-21 03:12:47,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=5068050.0, ans=0.125 2024-08-21 03:12:49,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5068050.0, ans=0.125 2024-08-21 03:12:53,973 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.75 vs. limit=10.0 2024-08-21 03:12:54,821 INFO [train_multi_KD3.py:845] (0/4) A total of 97 cuts. 15 from LS+wenet, 32 from Vox, 50 fro AS 2024-08-21 03:12:58,398 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 18 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-21 03:13:00,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5068150.0, ans=0.0 2024-08-21 03:13:12,663 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 3050, loss[loss=0.09251, beats_loss=0.01232, ecapa_loss=0.0001213, whisper_loss=0.07898, over 18349.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01039, ecapa_loss=0.0001364, whisper_loss=0.08925, over 3779820.93 frames. ], batch size: 73, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:13:40,658 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.87 vs. limit=15.0 2024-08-21 03:13:42,472 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-21 03:14:08,409 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.251e+01 2.540e+01 2.788e+01 3.733e+01, threshold=5.081e+01, percent-clipped=0.0 2024-08-21 03:14:11,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5068550.0, ans=0.125 2024-08-21 03:14:23,523 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 20 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-21 03:14:28,931 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 17 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-21 03:14:44,375 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 3100, loss[loss=0.07566, beats_loss=0.01266, ecapa_loss=0.0001289, whisper_loss=0.06171, over 16844.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01033, ecapa_loss=0.0001381, whisper_loss=0.0901, over 3795312.95 frames. ], batch size: 70, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 03:14:44,946 INFO [train_multi_KD3.py:845] (0/4) A total of 49 cuts. 14 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-21 03:15:36,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=5068950.0, ans=0.125 2024-08-21 03:15:42,175 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 27 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-21 03:16:16,563 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 22 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-21 03:16:17,552 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 3150, loss[loss=0.1025, beats_loss=0.008878, ecapa_loss=0.0001239, whisper_loss=0.09243, over 18489.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01036, ecapa_loss=0.0001375, whisper_loss=0.09047, over 3810728.29 frames. ], batch size: 69, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 03:16:24,954 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 13 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-21 03:17:00,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=5069450.0, ans=0.035 2024-08-21 03:17:05,892 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 30 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-21 03:17:08,027 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.75 vs. limit=12.0 2024-08-21 03:17:12,317 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.040e+01 2.426e+01 2.655e+01 2.939e+01 1.391e+02, threshold=5.310e+01, percent-clipped=2.0 2024-08-21 03:17:31,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5069650.0, ans=0.0 2024-08-21 03:17:45,121 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-21 03:17:48,329 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 3200, loss[loss=0.09074, beats_loss=0.0129, ecapa_loss=0.0001212, whisper_loss=0.07663, over 22248.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01038, ecapa_loss=0.0001375, whisper_loss=0.0906, over 3808382.75 frames. ], batch size: 92, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 03:18:02,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5069750.0, ans=0.1 2024-08-21 03:18:05,914 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.86 vs. limit=15.0 2024-08-21 03:18:19,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=5069850.0, ans=0.2 2024-08-21 03:18:21,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5069850.0, ans=0.125 2024-08-21 03:18:25,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=5069950.0, ans=0.125 2024-08-21 03:18:34,220 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 03:18:39,595 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 23 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-21 03:18:40,221 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.03 vs. limit=6.0 2024-08-21 03:18:47,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=5070050.0, ans=0.0 2024-08-21 03:18:58,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=5070050.0, ans=10.0 2024-08-21 03:19:07,685 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.81 vs. limit=15.0 2024-08-21 03:19:14,649 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 21 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-21 03:19:14,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5070150.0, ans=0.0 2024-08-21 03:19:17,710 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.95 vs. limit=5.0 2024-08-21 03:19:19,349 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 3250, loss[loss=0.101, beats_loss=0.01098, ecapa_loss=0.0001472, whisper_loss=0.08856, over 20648.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01039, ecapa_loss=0.0001374, whisper_loss=0.09123, over 3802700.41 frames. ], batch size: 86, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:19:20,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=5070250.0, ans=0.0 2024-08-21 03:19:20,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=5070250.0, ans=0.05 2024-08-21 03:19:31,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5070250.0, ans=0.125 2024-08-21 03:19:43,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=5070350.0, ans=0.95 2024-08-21 03:20:01,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5070450.0, ans=0.125 2024-08-21 03:20:13,332 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 23 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-21 03:20:24,127 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.311e+01 2.569e+01 2.814e+01 1.085e+02, threshold=5.138e+01, percent-clipped=2.0 2024-08-21 03:20:34,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5070550.0, ans=0.125 2024-08-21 03:20:48,239 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 31 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-21 03:21:03,132 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 28 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-21 03:21:05,445 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 31 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-21 03:21:06,347 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 3300, loss[loss=0.1126, beats_loss=0.009418, ecapa_loss=0.0001479, whisper_loss=0.1017, over 22259.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01034, ecapa_loss=0.0001383, whisper_loss=0.0921, over 3821230.97 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:21:15,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=5070750.0, ans=0.0 2024-08-21 03:21:24,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=5070750.0, ans=0.125 2024-08-21 03:21:35,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5070850.0, ans=0.1 2024-08-21 03:21:45,477 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.19 vs. limit=15.0 2024-08-21 03:22:11,121 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-21 03:22:16,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=5071050.0, ans=0.1 2024-08-21 03:22:35,364 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 14 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-21 03:22:55,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5071150.0, ans=0.0 2024-08-21 03:22:58,851 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 3350, loss[loss=0.1006, beats_loss=0.008621, ecapa_loss=0.0001283, whisper_loss=0.09071, over 13247.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01025, ecapa_loss=0.0001379, whisper_loss=0.09224, over 3794097.12 frames. ], batch size: 49, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:23:00,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5071250.0, ans=0.125 2024-08-21 03:23:20,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=5071350.0, ans=0.125 2024-08-21 03:24:06,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=5071550.0, ans=0.125 2024-08-21 03:24:09,820 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.266e+01 2.448e+01 2.718e+01 4.054e+01, threshold=4.896e+01, percent-clipped=0.0 2024-08-21 03:24:40,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5071650.0, ans=0.125 2024-08-21 03:24:53,303 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 21 from LS+wenet, 14 from Vox, 17 fro AS 2024-08-21 03:24:56,849 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 3400, loss[loss=0.1123, beats_loss=0.01101, ecapa_loss=0.000147, whisper_loss=0.09984, over 19282.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01023, ecapa_loss=0.000139, whisper_loss=0.09114, over 3768516.75 frames. ], batch size: 80, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:25:08,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5071750.0, ans=0.1 2024-08-21 03:25:15,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=5071750.0, ans=0.07 2024-08-21 03:25:53,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=5071950.0, ans=0.2 2024-08-21 03:25:56,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5071950.0, ans=0.0 2024-08-21 03:26:30,742 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.32 vs. limit=15.0 2024-08-21 03:26:35,034 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.42 vs. limit=15.0 2024-08-21 03:26:36,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5072150.0, ans=0.1 2024-08-21 03:26:48,899 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.337e+01 2024-08-21 03:26:57,266 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 3450, loss[loss=0.1117, beats_loss=0.008639, ecapa_loss=0.0001668, whisper_loss=0.1013, over 22919.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01019, ecapa_loss=0.0001394, whisper_loss=0.0913, over 3803747.76 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:27:00,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=5072250.0, ans=0.0 2024-08-21 03:27:30,376 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 25 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-21 03:28:09,578 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.313e+01 2.518e+01 2.811e+01 5.199e+01, threshold=5.037e+01, percent-clipped=1.0 2024-08-21 03:28:16,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5072550.0, ans=0.1 2024-08-21 03:28:16,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=5072550.0, ans=0.125 2024-08-21 03:28:26,130 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-21 03:28:49,594 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 18 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-21 03:28:52,417 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 3500, loss[loss=0.09647, beats_loss=0.01201, ecapa_loss=0.0001467, whisper_loss=0.08299, over 21456.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01032, ecapa_loss=0.0001397, whisper_loss=0.09023, over 3808205.33 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:29:12,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=5072850.0, ans=0.0 2024-08-21 03:29:16,334 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.62 vs. limit=15.0 2024-08-21 03:29:33,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5072950.0, ans=0.125 2024-08-21 03:29:46,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5072950.0, ans=0.125 2024-08-21 03:29:54,195 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 38 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-21 03:30:33,418 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.75 vs. limit=15.0 2024-08-21 03:30:43,415 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 30 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-21 03:30:44,519 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 3550, loss[loss=0.112, beats_loss=0.0113, ecapa_loss=0.0001226, whisper_loss=0.09948, over 21654.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01041, ecapa_loss=0.0001401, whisper_loss=0.08999, over 3834384.69 frames. ], batch size: 85, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:30:52,555 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 24 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-21 03:30:57,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5073250.0, ans=0.1 2024-08-21 03:31:02,615 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 14 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-21 03:31:18,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=5073350.0, ans=0.125 2024-08-21 03:31:42,110 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 9 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-21 03:31:44,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=5073450.0, ans=0.025 2024-08-21 03:31:52,536 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.266e+01 2.535e+01 2.803e+01 1.045e+02, threshold=5.070e+01, percent-clipped=1.0 2024-08-21 03:32:38,823 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 3600, loss[loss=0.111, beats_loss=0.01188, ecapa_loss=9.681e-05, whisper_loss=0.09816, over 16795.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01048, ecapa_loss=0.0001393, whisper_loss=0.08935, over 3787238.61 frames. ], batch size: 63, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:32:51,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=5073750.0, ans=0.0 2024-08-21 03:33:44,264 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 03:33:58,174 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 13 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-21 03:34:32,039 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 18 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-21 03:34:35,481 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 3650, loss[loss=0.08765, beats_loss=0.008641, ecapa_loss=0.0001962, whisper_loss=0.07705, over 19274.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01047, ecapa_loss=0.0001387, whisper_loss=0.08873, over 3813625.54 frames. ], batch size: 80, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:34:50,755 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 27 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-21 03:35:21,953 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.13 vs. limit=15.0 2024-08-21 03:35:48,664 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.270e+01 2.492e+01 2.659e+01 4.040e+01, threshold=4.984e+01, percent-clipped=0.0 2024-08-21 03:35:53,232 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 21 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-21 03:35:57,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5074550.0, ans=0.125 2024-08-21 03:36:01,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=5074550.0, ans=0.0 2024-08-21 03:36:07,933 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 14 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-21 03:36:25,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5074650.0, ans=0.125 2024-08-21 03:36:34,374 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 3700, loss[loss=0.1001, beats_loss=0.0112, ecapa_loss=0.0001165, whisper_loss=0.0877, over 19809.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01045, ecapa_loss=0.000139, whisper_loss=0.08878, over 3805833.97 frames. ], batch size: 75, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:37:10,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=5074850.0, ans=0.0 2024-08-21 03:37:10,505 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.68 vs. limit=15.0 2024-08-21 03:37:14,390 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 23 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-21 03:37:20,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=5074850.0, ans=0.125 2024-08-21 03:37:21,843 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 15 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-21 03:38:34,077 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 3750, loss[loss=0.1067, beats_loss=0.01119, ecapa_loss=0.0001303, whisper_loss=0.09419, over 22141.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01036, ecapa_loss=0.0001398, whisper_loss=0.08968, over 3797393.50 frames. ], batch size: 87, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:39:01,973 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 30 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-21 03:39:20,950 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 27 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-21 03:39:22,364 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 9 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-21 03:39:34,581 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-21 03:39:50,459 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.197e+01 2.452e+01 2.774e+01 3.553e+01, threshold=4.904e+01, percent-clipped=0.0 2024-08-21 03:40:19,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=5075650.0, ans=0.0 2024-08-21 03:40:21,334 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 26 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-21 03:40:34,830 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 26 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-21 03:40:35,822 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 3800, loss[loss=0.1126, beats_loss=0.01162, ecapa_loss=0.0001587, whisper_loss=0.09935, over 18301.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01041, ecapa_loss=0.0001398, whisper_loss=0.08919, over 3774324.50 frames. ], batch size: 75, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:40:46,015 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 18 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-21 03:40:54,457 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-21 03:41:07,609 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 23 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-21 03:41:09,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5075850.0, ans=0.125 2024-08-21 03:41:22,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=5075950.0, ans=0.0 2024-08-21 03:41:22,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5075950.0, ans=0.0 2024-08-21 03:41:53,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=5076050.0, ans=0.0 2024-08-21 03:42:01,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5076050.0, ans=0.0 2024-08-21 03:42:03,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=5076050.0, ans=0.0 2024-08-21 03:42:13,660 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=6.03 vs. limit=12.0 2024-08-21 03:42:17,458 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 24 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-21 03:42:29,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5076150.0, ans=0.1 2024-08-21 03:42:38,127 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 3850, loss[loss=0.1111, beats_loss=0.008143, ecapa_loss=0.0001298, whisper_loss=0.1017, over 18817.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01033, ecapa_loss=0.0001405, whisper_loss=0.08938, over 3770392.05 frames. ], batch size: 71, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:42:39,964 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.181e+00 2024-08-21 03:42:40,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=5076250.0, ans=0.2 2024-08-21 03:42:44,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5076250.0, ans=0.125 2024-08-21 03:43:29,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5076450.0, ans=0.125 2024-08-21 03:43:34,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=5076450.0, ans=0.125 2024-08-21 03:43:41,628 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 19 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-21 03:43:47,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=5076550.0, ans=0.125 2024-08-21 03:43:50,944 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.237e+01 2.453e+01 2.696e+01 3.570e+01, threshold=4.906e+01, percent-clipped=0.0 2024-08-21 03:44:12,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=5076650.0, ans=0.0 2024-08-21 03:44:38,213 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 3900, loss[loss=0.1342, beats_loss=0.007213, ecapa_loss=0.0001498, whisper_loss=0.1255, over 16442.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01039, ecapa_loss=0.0001405, whisper_loss=0.08944, over 3776756.17 frames. ], batch size: 63, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:44:47,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5076750.0, ans=0.125 2024-08-21 03:44:51,333 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.84 vs. limit=10.0 2024-08-21 03:45:24,662 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 23 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-21 03:46:08,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5077050.0, ans=0.0 2024-08-21 03:46:17,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=5077150.0, ans=0.2 2024-08-21 03:46:30,128 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 40 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-21 03:46:39,108 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 3950, loss[loss=0.09353, beats_loss=0.01275, ecapa_loss=0.0001019, whisper_loss=0.07976, over 23453.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01035, ecapa_loss=0.0001404, whisper_loss=0.0905, over 3815650.95 frames. ], batch size: 92, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:46:40,963 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2024-08-21 03:47:07,536 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.87 vs. limit=15.0 2024-08-21 03:47:36,067 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.339e+01 2024-08-21 03:47:39,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=5077450.0, ans=0.2 2024-08-21 03:47:51,690 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.265e+01 2.547e+01 2.985e+01 6.857e+01, threshold=5.095e+01, percent-clipped=1.0 2024-08-21 03:48:20,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=5077650.0, ans=0.125 2024-08-21 03:48:25,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=5077650.0, ans=0.125 2024-08-21 03:48:32,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=5077650.0, ans=0.2 2024-08-21 03:48:39,329 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 4000, loss[loss=0.09539, beats_loss=0.009145, ecapa_loss=0.0001718, whisper_loss=0.08453, over 19206.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01036, ecapa_loss=0.0001405, whisper_loss=0.09076, over 3833031.87 frames. ], batch size: 78, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:48:59,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=5077750.0, ans=0.1 2024-08-21 03:49:12,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5077850.0, ans=0.0 2024-08-21 03:49:14,561 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-21 03:49:20,098 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 36 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-21 03:49:25,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5077850.0, ans=0.125 2024-08-21 03:49:54,426 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.10 vs. limit=22.5 2024-08-21 03:50:01,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5078050.0, ans=0.0 2024-08-21 03:50:19,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=5078150.0, ans=0.125 2024-08-21 03:50:45,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5078150.0, ans=0.0 2024-08-21 03:50:45,540 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.58 vs. limit=22.5 2024-08-21 03:50:48,819 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 4050, loss[loss=0.09276, beats_loss=0.01205, ecapa_loss=0.0001299, whisper_loss=0.07941, over 22696.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01036, ecapa_loss=0.0001392, whisper_loss=0.09075, over 3847933.26 frames. ], batch size: 91, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:50:50,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=5078250.0, ans=0.95 2024-08-21 03:51:10,625 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.11 vs. limit=15.0 2024-08-21 03:51:42,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5078450.0, ans=0.125 2024-08-21 03:51:48,077 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-08-21 03:51:59,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=5078450.0, ans=0.125 2024-08-21 03:52:04,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5078450.0, ans=0.125 2024-08-21 03:52:10,508 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.320e+01 2.567e+01 2.893e+01 7.952e+01, threshold=5.134e+01, percent-clipped=3.0 2024-08-21 03:52:34,268 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 25 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-21 03:52:37,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=5078650.0, ans=0.0 2024-08-21 03:52:53,260 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 20 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-21 03:52:56,939 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.03 vs. limit=22.5 2024-08-21 03:53:00,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5078750.0, ans=0.1 2024-08-21 03:53:01,435 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 4100, loss[loss=0.08273, beats_loss=0.01104, ecapa_loss=0.0001704, whisper_loss=0.06998, over 15103.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01038, ecapa_loss=0.0001396, whisper_loss=0.09065, over 3867409.62 frames. ], batch size: 61, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:53:08,158 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 16 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-21 03:53:16,245 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0443761944770813, model_norm_threshold=51.335693359375 2024-08-21 03:53:16,413 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.724e+05, grad_sumsq=2.524e+07, orig_rms_sq=1.079e-02 2024-08-21 03:53:22,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5078750.0, ans=0.125 2024-08-21 03:53:50,121 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.84 vs. limit=12.0 2024-08-21 03:53:59,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=5078950.0, ans=0.125 2024-08-21 03:54:07,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5078950.0, ans=0.125 2024-08-21 03:54:10,453 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.83 vs. limit=15.0 2024-08-21 03:54:25,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=5079050.0, ans=0.0 2024-08-21 03:55:10,839 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 4150, loss[loss=0.0916, beats_loss=0.01141, ecapa_loss=0.0001485, whisper_loss=0.0787, over 19066.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01037, ecapa_loss=0.0001393, whisper_loss=0.0907, over 3878065.24 frames. ], batch size: 78, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:55:22,436 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 18 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-21 03:55:25,023 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 27 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-21 03:55:43,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=5079350.0, ans=0.125 2024-08-21 03:56:32,408 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.340e+01 2.592e+01 2.879e+01 1.157e+03, threshold=5.184e+01, percent-clipped=4.0 2024-08-21 03:56:48,132 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 20 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-21 03:56:48,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5079550.0, ans=0.0 2024-08-21 03:56:51,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5079550.0, ans=0.125 2024-08-21 03:56:59,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5079650.0, ans=0.125 2024-08-21 03:57:17,871 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 4200, loss[loss=0.07358, beats_loss=0.0108, ecapa_loss=0.0001639, whisper_loss=0.06114, over 14580.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01037, ecapa_loss=0.0001396, whisper_loss=0.09078, over 3841193.68 frames. ], batch size: 61, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:57:18,097 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 18 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-21 03:57:22,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5079750.0, ans=0.125 2024-08-21 03:57:30,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=5079750.0, ans=0.0 2024-08-21 03:57:55,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5079850.0, ans=0.125 2024-08-21 03:58:00,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5079850.0, ans=0.0 2024-08-21 03:58:15,112 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-508000.pt 2024-08-21 03:59:16,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=5080150.0, ans=0.0 2024-08-21 03:59:20,462 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 4250, loss[loss=0.1053, beats_loss=0.009288, ecapa_loss=0.0001323, whisper_loss=0.09465, over 23813.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01036, ecapa_loss=0.0001391, whisper_loss=0.09108, over 3805571.84 frames. ], batch size: 91, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 03:59:32,973 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-21 03:59:35,467 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 25 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-21 04:00:00,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=5080350.0, ans=15.0 2024-08-21 04:00:10,678 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 12 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-21 04:00:40,591 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.302e+01 2.518e+01 2.832e+01 1.053e+02, threshold=5.035e+01, percent-clipped=1.0 2024-08-21 04:00:58,458 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 15 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-21 04:01:18,464 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 21 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-21 04:01:22,739 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 36 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-21 04:01:28,533 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-21 04:01:29,540 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 4300, loss[loss=0.1001, beats_loss=0.01105, ecapa_loss=0.0001127, whisper_loss=0.08793, over 22658.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01036, ecapa_loss=0.0001392, whisper_loss=0.09077, over 3784428.09 frames. ], batch size: 89, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:01:38,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5080750.0, ans=0.0 2024-08-21 04:02:14,371 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2024-08-21 04:02:36,579 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 12 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-21 04:02:50,989 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.41 vs. limit=15.0 2024-08-21 04:03:04,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=5081050.0, ans=0.025 2024-08-21 04:03:30,187 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 4350, loss[loss=0.09471, beats_loss=0.01017, ecapa_loss=0.0001386, whisper_loss=0.08315, over 21418.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0104, ecapa_loss=0.0001372, whisper_loss=0.09072, over 3815786.90 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:04:22,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5081450.0, ans=0.125 2024-08-21 04:04:27,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=5081450.0, ans=0.0 2024-08-21 04:04:37,004 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.693e+01 2.205e+01 2.430e+01 2.775e+01 4.634e+01, threshold=4.861e+01, percent-clipped=0.0 2024-08-21 04:04:46,304 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 30 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-21 04:04:53,073 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 29 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-21 04:05:02,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5081650.0, ans=0.1 2024-08-21 04:05:19,895 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 4400, loss[loss=0.09126, beats_loss=0.01019, ecapa_loss=0.0001204, whisper_loss=0.07987, over 14184.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01043, ecapa_loss=0.0001363, whisper_loss=0.0902, over 3815301.30 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:05:55,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=5081850.0, ans=0.0 2024-08-21 04:07:20,080 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 22 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-21 04:07:26,071 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 26 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-21 04:07:26,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=5082150.0, ans=0.0 2024-08-21 04:07:31,853 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 4450, loss[loss=0.09253, beats_loss=0.01055, ecapa_loss=0.0001346, whisper_loss=0.08063, over 16881.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01036, ecapa_loss=0.0001377, whisper_loss=0.09032, over 3815201.78 frames. ], batch size: 66, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:07:50,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=5082250.0, ans=0.125 2024-08-21 04:08:09,257 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.32 vs. limit=22.5 2024-08-21 04:08:21,966 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.95 vs. limit=10.0 2024-08-21 04:08:27,646 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 24 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-21 04:08:33,298 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.40 vs. limit=15.0 2024-08-21 04:08:51,754 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.213e+01 2.422e+01 2.731e+01 3.413e+01, threshold=4.845e+01, percent-clipped=0.0 2024-08-21 04:09:08,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=5082550.0, ans=0.2 2024-08-21 04:09:14,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=5082650.0, ans=0.025 2024-08-21 04:09:29,854 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-21 04:09:42,529 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 4500, loss[loss=0.1136, beats_loss=0.01072, ecapa_loss=0.0001445, whisper_loss=0.1014, over 23276.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01034, ecapa_loss=0.0001383, whisper_loss=0.09092, over 3844829.32 frames. ], batch size: 92, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:10:08,822 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-21 04:10:21,874 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 36 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-21 04:10:35,814 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 18 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-21 04:11:04,318 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-08-21 04:11:37,253 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 13 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-21 04:11:47,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5083150.0, ans=0.125 2024-08-21 04:11:52,352 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-21 04:11:53,288 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 4550, loss[loss=0.0964, beats_loss=0.01276, ecapa_loss=0.0001428, whisper_loss=0.08221, over 21596.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01035, ecapa_loss=0.0001385, whisper_loss=0.09089, over 3811588.22 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:11:54,434 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 30 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-21 04:12:41,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5083350.0, ans=0.0 2024-08-21 04:12:44,815 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 26 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-21 04:13:06,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5083450.0, ans=0.1 2024-08-21 04:13:13,997 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.342e+01 2.629e+01 2.950e+01 5.025e+01, threshold=5.258e+01, percent-clipped=1.0 2024-08-21 04:13:15,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5083550.0, ans=0.125 2024-08-21 04:13:54,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=5083650.0, ans=15.0 2024-08-21 04:14:04,422 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 4600, loss[loss=0.1164, beats_loss=0.009065, ecapa_loss=0.0001403, whisper_loss=0.1059, over 15568.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01038, ecapa_loss=0.0001382, whisper_loss=0.0902, over 3821965.08 frames. ], batch size: 61, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:14:15,189 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 18 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-21 04:14:49,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5083850.0, ans=0.0 2024-08-21 04:15:26,706 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 34 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-21 04:15:48,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=5084150.0, ans=0.0 2024-08-21 04:15:50,478 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 19 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-21 04:15:53,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=5084150.0, ans=0.125 2024-08-21 04:15:55,756 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 24 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-21 04:15:56,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5084150.0, ans=0.1 2024-08-21 04:16:07,705 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 4650, loss[loss=0.07516, beats_loss=0.01281, ecapa_loss=0.0001212, whisper_loss=0.06114, over 18853.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01039, ecapa_loss=0.0001382, whisper_loss=0.09007, over 3840629.01 frames. ], batch size: 79, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:16:14,546 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2024-08-21 04:16:24,296 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 25 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-21 04:16:55,880 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 14 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-21 04:17:27,904 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.290e+01 2.552e+01 2.874e+01 1.481e+02, threshold=5.104e+01, percent-clipped=2.0 2024-08-21 04:17:30,613 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.96 vs. limit=10.0 2024-08-21 04:17:39,067 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 27 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-21 04:17:53,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5084650.0, ans=0.125 2024-08-21 04:17:54,205 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.68 vs. limit=6.0 2024-08-21 04:18:09,509 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-08-21 04:18:14,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5084750.0, ans=0.1 2024-08-21 04:18:14,921 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 4700, loss[loss=0.07787, beats_loss=0.0128, ecapa_loss=0.0001028, whisper_loss=0.06404, over 14347.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01038, ecapa_loss=0.0001389, whisper_loss=0.09011, over 3850957.50 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:18:18,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=5084750.0, ans=0.0 2024-08-21 04:18:36,618 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 19 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-21 04:19:33,671 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.35 vs. limit=15.0 2024-08-21 04:19:38,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=5085050.0, ans=0.0 2024-08-21 04:19:54,824 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.95 vs. limit=15.0 2024-08-21 04:20:01,205 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 14 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-21 04:20:06,804 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 27 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-21 04:20:11,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=5085150.0, ans=0.07 2024-08-21 04:20:16,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5085150.0, ans=0.125 2024-08-21 04:20:25,053 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 4750, loss[loss=0.1111, beats_loss=0.009168, ecapa_loss=0.0001424, whisper_loss=0.1005, over 19897.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01041, ecapa_loss=0.0001394, whisper_loss=0.08962, over 3836598.61 frames. ], batch size: 81, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:20:37,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=5085250.0, ans=0.125 2024-08-21 04:21:16,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=5085450.0, ans=15.0 2024-08-21 04:21:41,709 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.76 vs. limit=10.0 2024-08-21 04:21:44,310 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.249e+01 2.433e+01 2.723e+01 6.483e+01, threshold=4.865e+01, percent-clipped=1.0 2024-08-21 04:21:58,488 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.45 vs. limit=12.0 2024-08-21 04:22:01,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=5085550.0, ans=0.0 2024-08-21 04:22:28,942 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 24 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-21 04:22:33,061 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 4800, loss[loss=0.1137, beats_loss=0.006719, ecapa_loss=0.000167, whisper_loss=0.1053, over 16773.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01039, ecapa_loss=0.0001391, whisper_loss=0.08922, over 3823366.33 frames. ], batch size: 66, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:22:37,639 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 29 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-21 04:22:53,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=5085750.0, ans=0.0 2024-08-21 04:22:56,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5085750.0, ans=0.125 2024-08-21 04:23:16,349 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 22 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-21 04:23:25,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=5085950.0, ans=0.125 2024-08-21 04:23:42,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=5085950.0, ans=0.125 2024-08-21 04:23:53,504 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 21 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-21 04:23:56,337 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-21 04:23:56,843 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.40 vs. limit=22.5 2024-08-21 04:24:09,277 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 22 from LS+wenet, 12 from Vox, 19 fro AS 2024-08-21 04:24:38,945 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.52 vs. limit=22.5 2024-08-21 04:24:39,305 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 4850, loss[loss=0.09278, beats_loss=0.01173, ecapa_loss=0.0001145, whisper_loss=0.0799, over 19197.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01038, ecapa_loss=0.0001391, whisper_loss=0.08946, over 3835507.69 frames. ], batch size: 76, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:24:46,282 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 33 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-21 04:25:01,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=5086250.0, ans=0.0 2024-08-21 04:25:04,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=5086350.0, ans=0.2 2024-08-21 04:25:21,132 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.72 vs. limit=15.0 2024-08-21 04:25:56,110 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.255e+01 2.433e+01 2.647e+01 4.364e+01, threshold=4.866e+01, percent-clipped=0.0 2024-08-21 04:26:09,539 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 28 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-21 04:26:42,447 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 4900, loss[loss=0.1057, beats_loss=0.009965, ecapa_loss=0.000107, whisper_loss=0.09463, over 14009.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01035, ecapa_loss=0.0001386, whisper_loss=0.09005, over 3814118.98 frames. ], batch size: 53, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:26:49,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=5086750.0, ans=10.0 2024-08-21 04:27:04,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=5086750.0, ans=0.0 2024-08-21 04:27:06,927 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.45 vs. limit=15.0 2024-08-21 04:27:17,650 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 29 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-21 04:27:17,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=5086850.0, ans=0.2 2024-08-21 04:27:20,839 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2024-08-21 04:27:28,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5086850.0, ans=0.125 2024-08-21 04:27:45,841 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 35 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-21 04:28:36,672 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-21 04:28:52,730 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 4950, loss[loss=0.1173, beats_loss=0.007901, ecapa_loss=0.0001568, whisper_loss=0.1079, over 15877.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0104, ecapa_loss=0.0001393, whisper_loss=0.08958, over 3813012.21 frames. ], batch size: 63, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:28:59,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=5087250.0, ans=15.0 2024-08-21 04:29:18,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5087350.0, ans=0.125 2024-08-21 04:29:44,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5087450.0, ans=0.125 2024-08-21 04:29:55,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=5087450.0, ans=0.2 2024-08-21 04:30:03,236 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 28 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-21 04:30:06,403 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 20 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-21 04:30:14,708 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.734e+01 2.296e+01 2.472e+01 2.859e+01 4.220e+01, threshold=4.943e+01, percent-clipped=0.0 2024-08-21 04:30:33,742 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.58 vs. limit=8.0 2024-08-21 04:30:50,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=5087650.0, ans=0.125 2024-08-21 04:31:04,951 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 5000, loss[loss=0.1146, beats_loss=0.009222, ecapa_loss=0.000144, whisper_loss=0.1039, over 18656.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01047, ecapa_loss=0.0001383, whisper_loss=0.09019, over 3845754.84 frames. ], batch size: 72, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:31:20,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5087750.0, ans=0.125 2024-08-21 04:31:30,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=5087850.0, ans=0.2 2024-08-21 04:31:57,179 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 18 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-21 04:32:10,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=5087950.0, ans=0.0 2024-08-21 04:32:15,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5087950.0, ans=0.125 2024-08-21 04:32:26,076 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 16 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-21 04:32:28,726 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 26 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-21 04:33:08,082 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 5050, loss[loss=0.1135, beats_loss=0.01209, ecapa_loss=0.000159, whisper_loss=0.09982, over 19911.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0105, ecapa_loss=0.000139, whisper_loss=0.08972, over 3859449.86 frames. ], batch size: 84, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:34:07,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5088450.0, ans=0.125 2024-08-21 04:34:24,971 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.200e+01 2.422e+01 2.716e+01 3.329e+01, threshold=4.844e+01, percent-clipped=0.0 2024-08-21 04:34:29,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5088550.0, ans=0.125 2024-08-21 04:34:30,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=5088550.0, ans=0.2 2024-08-21 04:35:00,418 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 16 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-21 04:35:03,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=5088650.0, ans=0.2 2024-08-21 04:35:10,908 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 5100, loss[loss=0.1153, beats_loss=0.01098, ecapa_loss=0.0001332, whisper_loss=0.103, over 23771.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01051, ecapa_loss=0.0001386, whisper_loss=0.08939, over 3844291.84 frames. ], batch size: 93, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:35:22,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=5088750.0, ans=0.0 2024-08-21 04:35:25,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=5088750.0, ans=0.0 2024-08-21 04:35:26,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5088750.0, ans=0.125 2024-08-21 04:35:30,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=5088750.0, ans=0.125 2024-08-21 04:36:06,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=5088950.0, ans=0.0 2024-08-21 04:36:59,298 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-21 04:36:59,637 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.98 vs. limit=22.5 2024-08-21 04:37:05,335 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 5150, loss[loss=0.103, beats_loss=0.00947, ecapa_loss=0.0001658, whisper_loss=0.09183, over 19549.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01056, ecapa_loss=0.0001381, whisper_loss=0.08944, over 3866114.46 frames. ], batch size: 79, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:37:12,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=5089250.0, ans=0.125 2024-08-21 04:37:31,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=5089350.0, ans=0.2 2024-08-21 04:37:37,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=5089350.0, ans=0.125 2024-08-21 04:37:40,443 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 34 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-21 04:37:40,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=5089350.0, ans=0.0 2024-08-21 04:37:47,896 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 20 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-21 04:38:12,367 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.286e+01 2.550e+01 3.060e+01 1.523e+02, threshold=5.101e+01, percent-clipped=5.0 2024-08-21 04:38:35,949 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 22 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-21 04:38:56,241 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 5200, loss[loss=0.08712, beats_loss=0.01337, ecapa_loss=0.00013, whisper_loss=0.07245, over 22472.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01049, ecapa_loss=0.0001384, whisper_loss=0.09005, over 3866810.08 frames. ], batch size: 95, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:39:33,124 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 24 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-21 04:39:56,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=5089950.0, ans=0.2 2024-08-21 04:40:24,956 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-21 04:40:27,530 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.54 vs. limit=22.5 2024-08-21 04:40:47,227 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 5250, loss[loss=0.0902, beats_loss=0.01103, ecapa_loss=0.0001266, whisper_loss=0.0779, over 13204.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01041, ecapa_loss=0.000138, whisper_loss=0.09076, over 3831113.50 frames. ], batch size: 53, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 04:40:51,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=5090250.0, ans=0.125 2024-08-21 04:41:16,601 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 22 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-21 04:41:27,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5090350.0, ans=0.0 2024-08-21 04:41:39,162 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 26 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-21 04:41:43,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5090450.0, ans=0.0 2024-08-21 04:41:43,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=5090450.0, ans=0.2 2024-08-21 04:41:58,050 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.269e+01 2.486e+01 2.907e+01 3.986e+01, threshold=4.971e+01, percent-clipped=0.0 2024-08-21 04:42:04,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=5090550.0, ans=0.0 2024-08-21 04:42:40,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=5090650.0, ans=0.125 2024-08-21 04:42:42,845 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 5300, loss[loss=0.09391, beats_loss=0.01115, ecapa_loss=0.0001353, whisper_loss=0.08141, over 14836.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01046, ecapa_loss=0.0001372, whisper_loss=0.09063, over 3828403.30 frames. ], batch size: 62, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 04:43:23,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5090850.0, ans=0.1 2024-08-21 04:43:23,812 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.38 vs. limit=15.0 2024-08-21 04:43:54,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=5091050.0, ans=10.0 2024-08-21 04:43:59,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=5091050.0, ans=0.2 2024-08-21 04:43:59,357 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.86 vs. limit=15.0 2024-08-21 04:44:20,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5091150.0, ans=0.1 2024-08-21 04:44:31,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5091150.0, ans=0.1 2024-08-21 04:44:42,690 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 5350, loss[loss=0.09944, beats_loss=0.01005, ecapa_loss=0.0001368, whisper_loss=0.08802, over 20523.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0105, ecapa_loss=0.0001373, whisper_loss=0.08981, over 3813951.37 frames. ], batch size: 80, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 04:44:56,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5091250.0, ans=0.1 2024-08-21 04:45:21,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5091350.0, ans=0.1 2024-08-21 04:45:27,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=5091350.0, ans=0.125 2024-08-21 04:45:49,713 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 18 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-21 04:46:00,046 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.175e+01 2.396e+01 2.648e+01 3.168e+01, threshold=4.792e+01, percent-clipped=0.0 2024-08-21 04:46:19,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=5091550.0, ans=0.09899494936611666 2024-08-21 04:46:27,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5091650.0, ans=0.125 2024-08-21 04:46:48,412 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 5400, loss[loss=0.07861, beats_loss=0.0137, ecapa_loss=0.0001045, whisper_loss=0.06386, over 16925.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01045, ecapa_loss=0.0001376, whisper_loss=0.09031, over 3838686.31 frames. ], batch size: 71, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 04:46:53,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.57 vs. limit=15.0 2024-08-21 04:46:55,451 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-21 04:47:15,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5091850.0, ans=0.125 2024-08-21 04:47:15,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5091850.0, ans=0.0 2024-08-21 04:47:50,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=5091950.0, ans=0.2 2024-08-21 04:47:56,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5091950.0, ans=0.1 2024-08-21 04:48:02,015 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 21 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-21 04:48:18,059 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 14 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-21 04:48:36,592 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2024-08-21 04:48:46,186 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 04:48:57,117 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 5450, loss[loss=0.09104, beats_loss=0.01306, ecapa_loss=0.000122, whisper_loss=0.07677, over 21874.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01049, ecapa_loss=0.0001365, whisper_loss=0.091, over 3863413.21 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 04:50:18,958 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.233e+01 2.525e+01 2.938e+01 2.405e+02, threshold=5.050e+01, percent-clipped=4.0 2024-08-21 04:50:29,188 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-21 04:50:45,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5092650.0, ans=0.125 2024-08-21 04:51:09,340 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 5500, loss[loss=0.1203, beats_loss=0.007278, ecapa_loss=0.0002047, whisper_loss=0.111, over 16100.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0104, ecapa_loss=0.0001368, whisper_loss=0.09119, over 3856783.98 frames. ], batch size: 66, lr: 1.75e-03, grad_scale: 1.152921504606847e+18 2024-08-21 04:51:15,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=5092750.0, ans=0.2 2024-08-21 04:51:37,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5092850.0, ans=0.0 2024-08-21 04:51:53,002 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.967e+05 2024-08-21 04:51:53,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5092850.0, ans=0.125 2024-08-21 04:52:10,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=5092950.0, ans=0.125 2024-08-21 04:52:24,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5093050.0, ans=0.125 2024-08-21 04:52:29,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=5093050.0, ans=0.125 2024-08-21 04:52:38,215 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.98 vs. limit=10.0 2024-08-21 04:53:00,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=5093150.0, ans=0.0 2024-08-21 04:53:14,425 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 22 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-21 04:53:20,894 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 5550, loss[loss=0.08178, beats_loss=0.008325, ecapa_loss=0.0001989, whisper_loss=0.07147, over 13566.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01039, ecapa_loss=0.0001375, whisper_loss=0.09088, over 3819337.54 frames. ], batch size: 57, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:53:59,803 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.06 vs. limit=15.0 2024-08-21 04:54:09,966 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 16 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-21 04:54:24,058 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 32 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-21 04:54:48,034 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.248e+01 2.482e+01 2.824e+01 3.933e+01, threshold=4.964e+01, percent-clipped=0.0 2024-08-21 04:55:01,574 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 18 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-21 04:55:09,349 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 26 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-21 04:55:23,836 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 18 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-21 04:55:24,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=5093650.0, ans=0.2 2024-08-21 04:55:33,852 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 5600, loss[loss=0.101, beats_loss=0.006925, ecapa_loss=0.0001727, whisper_loss=0.09231, over 14432.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01036, ecapa_loss=0.000138, whisper_loss=0.09067, over 3812953.42 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:55:43,718 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 24 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-21 04:56:31,080 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.00 vs. limit=10.0 2024-08-21 04:57:14,571 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 17 from LS+wenet, 18 from Vox, 16 fro AS 2024-08-21 04:57:16,844 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 21 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-21 04:57:35,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5094250.0, ans=0.125 2024-08-21 04:57:35,948 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 5650, loss[loss=0.0778, beats_loss=0.01116, ecapa_loss=0.0001098, whisper_loss=0.06554, over 13922.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01032, ecapa_loss=0.0001386, whisper_loss=0.09041, over 3789501.84 frames. ], batch size: 54, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:57:53,011 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.14 vs. limit=15.0 2024-08-21 04:58:04,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5094350.0, ans=0.0 2024-08-21 04:58:49,183 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.209e+01 2.456e+01 2.705e+01 6.075e+01, threshold=4.911e+01, percent-clipped=1.0 2024-08-21 04:59:04,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5094550.0, ans=0.1 2024-08-21 04:59:15,011 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-08-21 04:59:34,815 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 5700, loss[loss=0.1042, beats_loss=0.009761, ecapa_loss=0.0001408, whisper_loss=0.09302, over 16870.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01033, ecapa_loss=0.0001392, whisper_loss=0.09011, over 3806698.93 frames. ], batch size: 65, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 04:59:36,586 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 27 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-21 04:59:42,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5094750.0, ans=0.125 2024-08-21 04:59:47,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5094750.0, ans=0.125 2024-08-21 05:00:20,486 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 24 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-21 05:00:26,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=5094950.0, ans=0.09899494936611666 2024-08-21 05:00:27,798 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 17 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-21 05:00:37,208 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.53 vs. limit=22.5 2024-08-21 05:00:50,005 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.54 vs. limit=12.0 2024-08-21 05:00:59,741 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 17 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-21 05:01:00,075 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=8.613e+00 2024-08-21 05:01:02,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5095150.0, ans=0.125 2024-08-21 05:01:15,427 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 05:01:25,337 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 5750, loss[loss=0.09885, beats_loss=0.009758, ecapa_loss=0.0001802, whisper_loss=0.08729, over 16177.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01033, ecapa_loss=0.000139, whisper_loss=0.09009, over 3799643.86 frames. ], batch size: 67, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:02:03,008 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 27 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-21 05:02:07,708 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2024-08-21 05:02:38,937 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.297e+01 2.497e+01 2.736e+01 4.299e+01, threshold=4.994e+01, percent-clipped=0.0 2024-08-21 05:02:52,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5095550.0, ans=0.125 2024-08-21 05:03:08,211 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 21 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-21 05:03:20,583 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 5800, loss[loss=0.1125, beats_loss=0.008718, ecapa_loss=0.000144, whisper_loss=0.1023, over 20589.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0103, ecapa_loss=0.0001397, whisper_loss=0.09023, over 3829717.41 frames. ], batch size: 79, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:03:22,139 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 24 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-21 05:03:26,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5095750.0, ans=0.125 2024-08-21 05:03:51,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=5095850.0, ans=0.125 2024-08-21 05:03:58,424 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 30 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-21 05:04:09,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5095950.0, ans=0.125 2024-08-21 05:04:09,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5095950.0, ans=0.1 2024-08-21 05:04:11,760 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 16 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-21 05:04:12,307 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=15.0 2024-08-21 05:04:19,452 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-21 05:04:43,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5096050.0, ans=0.0 2024-08-21 05:04:48,488 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 18 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-21 05:04:48,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=5096050.0, ans=0.2 2024-08-21 05:05:08,450 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 18 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-21 05:05:10,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=5096250.0, ans=0.0 2024-08-21 05:05:11,384 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 5850, loss[loss=0.0912, beats_loss=0.01115, ecapa_loss=0.000105, whisper_loss=0.079, over 14791.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01032, ecapa_loss=0.0001389, whisper_loss=0.09058, over 3838303.09 frames. ], batch size: 57, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:05:49,696 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-21 05:05:53,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5096450.0, ans=0.1 2024-08-21 05:06:08,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5096450.0, ans=0.1 2024-08-21 05:06:14,566 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.298e+01 2.570e+01 2.824e+01 3.912e+01, threshold=5.140e+01, percent-clipped=0.0 2024-08-21 05:06:42,129 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.46 vs. limit=15.0 2024-08-21 05:06:50,868 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 5900, loss[loss=0.1205, beats_loss=0.008776, ecapa_loss=0.0001601, whisper_loss=0.1102, over 19260.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0104, ecapa_loss=0.0001392, whisper_loss=0.08971, over 3792500.04 frames. ], batch size: 76, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:06:59,005 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 28 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-21 05:06:59,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=5096750.0, ans=0.0 2024-08-21 05:07:07,019 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 32 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-21 05:07:08,978 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-21 05:07:11,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5096850.0, ans=0.125 2024-08-21 05:07:15,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5096850.0, ans=0.125 2024-08-21 05:07:17,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5096850.0, ans=0.125 2024-08-21 05:07:19,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=5096850.0, ans=0.125 2024-08-21 05:07:38,364 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 32 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-21 05:07:38,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=5096950.0, ans=0.0 2024-08-21 05:07:38,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5096950.0, ans=0.125 2024-08-21 05:07:46,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5096950.0, ans=0.125 2024-08-21 05:07:48,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=5096950.0, ans=0.025 2024-08-21 05:07:54,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5097050.0, ans=0.0 2024-08-21 05:08:15,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5097150.0, ans=0.125 2024-08-21 05:08:24,152 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 32 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-21 05:08:28,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5097150.0, ans=0.0 2024-08-21 05:08:35,849 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 5950, loss[loss=0.1298, beats_loss=0.008257, ecapa_loss=0.0001763, whisper_loss=0.1198, over 18227.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01037, ecapa_loss=0.0001393, whisper_loss=0.09057, over 3826698.09 frames. ], batch size: 73, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:09:00,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5097350.0, ans=0.125 2024-08-21 05:09:25,850 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.92 vs. limit=15.0 2024-08-21 05:09:27,614 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-21 05:09:43,022 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.327e+01 2.629e+01 2.901e+01 4.645e+01, threshold=5.259e+01, percent-clipped=0.0 2024-08-21 05:09:44,348 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.22 vs. limit=15.0 2024-08-21 05:09:51,553 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 23 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-21 05:09:58,436 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.11 vs. limit=12.0 2024-08-21 05:10:02,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5097650.0, ans=0.0 2024-08-21 05:10:12,441 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.80 vs. limit=10.0 2024-08-21 05:10:16,079 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 6000, loss[loss=0.08665, beats_loss=0.01394, ecapa_loss=0.0001293, whisper_loss=0.07141, over 21074.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01039, ecapa_loss=0.000139, whisper_loss=0.09015, over 3795749.11 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:10:16,080 INFO [train_multi_KD3.py:1140] (0/4) Computing validation loss 2024-08-21 05:10:53,987 INFO [train_multi_KD3.py:1150] (0/4) Epoch 35, validation on ASR_libri: loss=0.2537, beats_loss=0, ecapa_loss=0.0005022, whisper_loss=0.2487, over 931116.00 frames. 2024-08-21 05:11:14,918 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.8832, 3.8111, 3.1251, 3.3699], device='cuda:0') 2024-08-21 05:11:19,553 INFO [train_multi_KD3.py:1150] (0/4) Epoch 35, validation on SV_voxceleb1: loss=0.003907, beats_loss=0, ecapa_loss=0.0003907, whisper_loss=0, over 944235.00 frames. 2024-08-21 05:13:02,948 INFO [train_multi_KD3.py:1150] (0/4) Epoch 35, validation on AT_audioset: loss=0.023, beats_loss=0.023, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-21 05:13:02,953 INFO [train_multi_KD3.py:1156] (0/4) Maximum memory allocated so far is 32775MB 2024-08-21 05:13:10,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=5097750.0, ans=0.125 2024-08-21 05:13:28,714 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 29 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-21 05:13:35,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=5097850.0, ans=0.125 2024-08-21 05:13:36,359 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.38 vs. limit=12.0 2024-08-21 05:13:45,372 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=15.0 2024-08-21 05:13:55,214 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 27 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-21 05:14:17,446 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 23 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-21 05:14:30,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=5098150.0, ans=0.07 2024-08-21 05:14:34,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=5098150.0, ans=15.0 2024-08-21 05:14:37,231 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 6050, loss[loss=0.07831, beats_loss=0.01312, ecapa_loss=0.0001507, whisper_loss=0.06368, over 18699.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01043, ecapa_loss=0.0001381, whisper_loss=0.09021, over 3811990.51 frames. ], batch size: 78, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:14:38,815 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.51 vs. limit=22.5 2024-08-21 05:14:53,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5098250.0, ans=0.1 2024-08-21 05:15:11,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5098350.0, ans=0.1 2024-08-21 05:15:17,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5098450.0, ans=0.0 2024-08-21 05:15:26,238 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.77 vs. limit=15.0 2024-08-21 05:15:30,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=5098450.0, ans=0.125 2024-08-21 05:15:34,093 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 21 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-21 05:15:39,461 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.180e+01 2.501e+01 2.659e+01 4.695e+01, threshold=5.002e+01, percent-clipped=0.0 2024-08-21 05:15:56,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=5098650.0, ans=0.0 2024-08-21 05:15:56,712 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.417e+00 2024-08-21 05:16:02,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5098650.0, ans=0.0 2024-08-21 05:16:02,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=5098650.0, ans=0.125 2024-08-21 05:16:05,892 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 24 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-21 05:16:08,092 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-21 05:16:12,977 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 6100, loss[loss=0.08509, beats_loss=0.01014, ecapa_loss=0.0001499, whisper_loss=0.07346, over 16681.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01048, ecapa_loss=0.0001377, whisper_loss=0.08964, over 3814940.01 frames. ], batch size: 67, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:16:16,387 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.11 vs. limit=15.0 2024-08-21 05:16:24,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=5098750.0, ans=0.2 2024-08-21 05:16:27,312 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 20 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-21 05:16:46,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=5098850.0, ans=0.2 2024-08-21 05:17:09,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=5098950.0, ans=0.2 2024-08-21 05:17:09,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5098950.0, ans=0.1 2024-08-21 05:17:17,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=5099050.0, ans=0.125 2024-08-21 05:17:31,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=5099150.0, ans=0.0 2024-08-21 05:17:35,644 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 32 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-21 05:17:45,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=5099150.0, ans=0.1 2024-08-21 05:17:47,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=5099150.0, ans=0.2 2024-08-21 05:17:52,002 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 6150, loss[loss=0.08638, beats_loss=0.01113, ecapa_loss=0.0001463, whisper_loss=0.0738, over 21634.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01047, ecapa_loss=0.0001368, whisper_loss=0.09085, over 3837671.81 frames. ], batch size: 91, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:18:01,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5099250.0, ans=0.125 2024-08-21 05:18:17,429 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-21 05:18:30,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=5099450.0, ans=0.05 2024-08-21 05:18:40,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=5099450.0, ans=0.05 2024-08-21 05:18:40,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=5099450.0, ans=0.5 2024-08-21 05:18:50,583 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 25 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-21 05:18:53,368 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.269e+01 2.480e+01 2.871e+01 4.819e+02, threshold=4.960e+01, percent-clipped=2.0 2024-08-21 05:19:01,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5099550.0, ans=0.0 2024-08-21 05:19:26,654 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 6200, loss[loss=0.09568, beats_loss=0.01262, ecapa_loss=0.0001432, whisper_loss=0.08163, over 21353.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01049, ecapa_loss=0.0001369, whisper_loss=0.09087, over 3825376.36 frames. ], batch size: 89, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:19:44,942 INFO [train_multi_KD3.py:845] (0/4) A total of 96 cuts. 26 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-21 05:20:05,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=5099950.0, ans=0.125 2024-08-21 05:20:44,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5100050.0, ans=0.125 2024-08-21 05:20:48,392 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.19 vs. limit=15.0 2024-08-21 05:20:49,978 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 15 from LS+wenet, 27 from Vox, 22 fro AS 2024-08-21 05:20:54,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5100150.0, ans=0.125 2024-08-21 05:21:00,425 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 34 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-21 05:21:08,199 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 6250, loss[loss=0.1177, beats_loss=0.01121, ecapa_loss=0.0001229, whisper_loss=0.1053, over 22747.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01044, ecapa_loss=0.0001374, whisper_loss=0.09124, over 3835932.15 frames. ], batch size: 88, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:21:19,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5100250.0, ans=0.0 2024-08-21 05:21:28,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5100350.0, ans=0.125 2024-08-21 05:21:31,335 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.65 vs. limit=10.0 2024-08-21 05:21:42,173 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 21 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-21 05:21:46,216 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 23 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-21 05:22:02,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5100450.0, ans=0.125 2024-08-21 05:22:12,584 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.226e+01 2.520e+01 2.834e+01 9.847e+01, threshold=5.039e+01, percent-clipped=0.0 2024-08-21 05:22:19,336 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 20 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-21 05:22:27,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=5100650.0, ans=0.2 2024-08-21 05:22:38,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5100650.0, ans=0.125 2024-08-21 05:22:46,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=5100650.0, ans=0.0 2024-08-21 05:22:49,578 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 6300, loss[loss=0.07373, beats_loss=0.01076, ecapa_loss=0.0001863, whisper_loss=0.06111, over 20456.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01045, ecapa_loss=0.0001385, whisper_loss=0.09086, over 3854678.46 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:22:54,609 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.93 vs. limit=22.5 2024-08-21 05:23:12,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=5100850.0, ans=0.0 2024-08-21 05:23:32,631 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 20 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-21 05:23:38,643 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 30 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-21 05:23:49,130 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 17 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-21 05:23:49,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5101050.0, ans=0.125 2024-08-21 05:23:57,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=5101050.0, ans=0.125 2024-08-21 05:24:10,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=5101150.0, ans=0.0 2024-08-21 05:24:26,801 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 6350, loss[loss=0.1048, beats_loss=0.01027, ecapa_loss=0.0001044, whisper_loss=0.09347, over 23272.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01048, ecapa_loss=0.0001381, whisper_loss=0.08978, over 3843532.21 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:24:36,428 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 22 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-21 05:24:40,406 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 05:24:59,155 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.587e+05 2024-08-21 05:25:11,425 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 16 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-21 05:25:29,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=5101550.0, ans=0.04949747468305833 2024-08-21 05:25:30,125 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.267e+01 2.496e+01 2.800e+01 3.336e+01, threshold=4.993e+01, percent-clipped=1.0 2024-08-21 05:25:32,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5101550.0, ans=0.125 2024-08-21 05:25:55,036 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 18 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-21 05:26:04,601 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 6400, loss[loss=0.104, beats_loss=0.008853, ecapa_loss=0.0001211, whisper_loss=0.09392, over 18016.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0105, ecapa_loss=0.0001377, whisper_loss=0.0894, over 3822737.00 frames. ], batch size: 66, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:26:07,863 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.211e+00 2024-08-21 05:26:13,257 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0562300942838192, model_norm_threshold=49.92792510986328 2024-08-21 05:26:13,427 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.1.norm.log_scale with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.997e+04, grad_sumsq=6.997e+04, orig_rms_sq=1.000e+00 2024-08-21 05:26:14,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=5101750.0, ans=0.07 2024-08-21 05:26:18,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5101750.0, ans=0.125 2024-08-21 05:26:25,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=5101850.0, ans=0.125 2024-08-21 05:26:44,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=5101950.0, ans=0.07 2024-08-21 05:26:51,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=5101950.0, ans=0.025 2024-08-21 05:26:57,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=5101950.0, ans=0.125 2024-08-21 05:27:06,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=5102050.0, ans=0.0 2024-08-21 05:27:08,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5102050.0, ans=0.125 2024-08-21 05:27:12,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=5102050.0, ans=0.125 2024-08-21 05:27:18,736 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.90 vs. limit=15.0 2024-08-21 05:27:19,958 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-21 05:27:21,532 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 13 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-21 05:27:28,700 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 16 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-21 05:27:37,083 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 6450, loss[loss=0.1037, beats_loss=0.01084, ecapa_loss=0.0001268, whisper_loss=0.09162, over 22732.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01054, ecapa_loss=0.0001378, whisper_loss=0.08942, over 3831586.91 frames. ], batch size: 90, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:27:45,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=5102250.0, ans=0.2 2024-08-21 05:27:50,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=5102250.0, ans=0.2 2024-08-21 05:27:51,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=5102250.0, ans=0.125 2024-08-21 05:28:20,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5102450.0, ans=0.125 2024-08-21 05:28:34,916 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.209e+01 2.498e+01 2.911e+01 8.879e+02, threshold=4.995e+01, percent-clipped=1.0 2024-08-21 05:28:52,756 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.609e-03 2024-08-21 05:28:56,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=5102650.0, ans=0.5 2024-08-21 05:29:07,902 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 6500, loss[loss=0.061, beats_loss=0.01527, ecapa_loss=9.339e-05, whisper_loss=0.04479, over 12991.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01053, ecapa_loss=0.0001381, whisper_loss=0.08909, over 3791030.22 frames. ], batch size: 52, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:29:24,682 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 23 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-21 05:29:35,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=5102850.0, ans=0.0 2024-08-21 05:29:54,660 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 30 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-21 05:30:21,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=5103050.0, ans=0.0 2024-08-21 05:30:23,678 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-21 05:30:25,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5103050.0, ans=0.1 2024-08-21 05:30:33,854 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 23 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-21 05:30:34,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=5103150.0, ans=0.2 2024-08-21 05:30:43,568 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.20 vs. limit=22.5 2024-08-21 05:30:49,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5103250.0, ans=0.125 2024-08-21 05:30:50,204 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 6550, loss[loss=0.1144, beats_loss=0.009358, ecapa_loss=0.0001527, whisper_loss=0.1035, over 21828.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01044, ecapa_loss=0.000139, whisper_loss=0.08948, over 3810779.07 frames. ], batch size: 88, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:31:01,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=5103250.0, ans=0.125 2024-08-21 05:31:04,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5103250.0, ans=0.125 2024-08-21 05:31:17,482 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 31 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-21 05:31:27,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=5103350.0, ans=0.125 2024-08-21 05:31:29,343 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-21 05:31:43,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=5103450.0, ans=0.2 2024-08-21 05:31:52,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5103450.0, ans=0.1 2024-08-21 05:31:54,227 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-21 05:31:59,039 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.280e+01 2.467e+01 2.785e+01 3.437e+01, threshold=4.934e+01, percent-clipped=0.0 2024-08-21 05:32:06,529 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 26 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-21 05:32:19,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=5103650.0, ans=0.125 2024-08-21 05:32:21,842 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2024-08-21 05:32:25,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5103650.0, ans=0.125 2024-08-21 05:32:27,605 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.66 vs. limit=15.0 2024-08-21 05:32:33,371 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2024-08-21 05:32:34,103 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 6600, loss[loss=0.1172, beats_loss=0.00949, ecapa_loss=0.0001269, whisper_loss=0.1064, over 16227.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01041, ecapa_loss=0.0001396, whisper_loss=0.08939, over 3842400.18 frames. ], batch size: 63, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:32:48,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5103750.0, ans=0.125 2024-08-21 05:32:49,872 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 40 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-21 05:33:13,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=5103950.0, ans=0.0 2024-08-21 05:33:16,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=5103950.0, ans=0.0 2024-08-21 05:33:59,002 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-21 05:34:04,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5104150.0, ans=0.1 2024-08-21 05:34:06,642 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 18 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-21 05:34:08,534 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-21 05:34:13,041 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 6650, loss[loss=0.1004, beats_loss=0.01055, ecapa_loss=0.0001368, whisper_loss=0.08846, over 17881.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01039, ecapa_loss=0.0001389, whisper_loss=0.09005, over 3834080.64 frames. ], batch size: 72, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:34:14,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=5104250.0, ans=0.125 2024-08-21 05:34:33,777 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 22 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-21 05:34:40,395 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-21 05:34:55,467 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 28 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-21 05:34:55,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5104450.0, ans=0.125 2024-08-21 05:35:15,647 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.320e+01 2.541e+01 2.816e+01 4.422e+01, threshold=5.082e+01, percent-clipped=0.0 2024-08-21 05:35:33,162 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.18 vs. limit=22.5 2024-08-21 05:35:50,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5104750.0, ans=0.0 2024-08-21 05:35:51,147 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 6700, loss[loss=0.1042, beats_loss=0.01161, ecapa_loss=0.0001213, whisper_loss=0.09134, over 18747.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001387, whisper_loss=0.09036, over 3857386.21 frames. ], batch size: 75, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:36:22,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=5104850.0, ans=0.5 2024-08-21 05:36:30,396 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 24 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-21 05:36:33,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5104950.0, ans=0.125 2024-08-21 05:36:35,465 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 26 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-21 05:36:42,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5104950.0, ans=0.125 2024-08-21 05:36:44,903 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.86 vs. limit=15.0 2024-08-21 05:37:03,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5105050.0, ans=0.125 2024-08-21 05:37:04,088 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-21 05:37:05,296 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 32 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-21 05:37:05,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5105050.0, ans=0.125 2024-08-21 05:37:21,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=5105150.0, ans=0.2 2024-08-21 05:37:27,726 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 6750, loss[loss=0.09693, beats_loss=0.01181, ecapa_loss=0.0001265, whisper_loss=0.08385, over 18043.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01035, ecapa_loss=0.0001381, whisper_loss=0.09113, over 3890406.39 frames. ], batch size: 76, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:38:17,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5105450.0, ans=0.1 2024-08-21 05:38:17,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=5105450.0, ans=0.04949747468305833 2024-08-21 05:38:21,074 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-21 05:38:25,765 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.375e+01 2.599e+01 2.848e+01 3.757e+01, threshold=5.199e+01, percent-clipped=0.0 2024-08-21 05:38:27,117 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 21 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-21 05:38:31,240 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 21 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-21 05:38:38,643 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 26 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-21 05:38:49,462 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 23 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-21 05:38:58,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5105750.0, ans=0.125 2024-08-21 05:38:59,459 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 6800, loss[loss=0.06677, beats_loss=0.009448, ecapa_loss=0.0001946, whisper_loss=0.05537, over 12156.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01034, ecapa_loss=0.0001375, whisper_loss=0.09086, over 3875472.09 frames. ], batch size: 54, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:39:06,208 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 26 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-21 05:39:17,024 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=22.11 vs. limit=22.5 2024-08-21 05:39:38,493 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.08 vs. limit=12.0 2024-08-21 05:39:39,588 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.28 vs. limit=22.5 2024-08-21 05:39:48,343 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 30 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-21 05:39:49,748 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-21 05:40:15,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=5106150.0, ans=0.0 2024-08-21 05:40:18,991 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 19 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-21 05:40:33,862 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 6850, loss[loss=0.1014, beats_loss=0.009566, ecapa_loss=9.972e-05, whisper_loss=0.09081, over 20634.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01031, ecapa_loss=0.0001375, whisper_loss=0.09101, over 3854067.46 frames. ], batch size: 76, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:40:40,135 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-21 05:40:42,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=5106250.0, ans=0.2 2024-08-21 05:40:53,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=5106350.0, ans=0.2 2024-08-21 05:41:06,733 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.22 vs. limit=12.0 2024-08-21 05:41:17,274 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 21 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-21 05:41:29,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=5106550.0, ans=0.0 2024-08-21 05:41:32,729 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.327e+01 2.654e+01 2.944e+01 2.744e+02, threshold=5.308e+01, percent-clipped=2.0 2024-08-21 05:41:33,579 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 28 from LS+wenet, 11 from Vox, 41 fro AS 2024-08-21 05:41:52,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5106650.0, ans=0.0 2024-08-21 05:41:55,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5106650.0, ans=0.125 2024-08-21 05:42:05,749 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.40 vs. limit=15.0 2024-08-21 05:42:05,956 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 6900, loss[loss=0.09383, beats_loss=0.01092, ecapa_loss=0.0001261, whisper_loss=0.08165, over 16266.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01039, ecapa_loss=0.0001376, whisper_loss=0.09065, over 3871287.83 frames. ], batch size: 64, lr: 1.75e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:42:17,439 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 34 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-21 05:42:27,793 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 32 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-21 05:42:36,556 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-21 05:42:45,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5106950.0, ans=0.0 2024-08-21 05:42:45,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5106950.0, ans=0.125 2024-08-21 05:42:47,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=5106950.0, ans=0.035 2024-08-21 05:42:49,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=5106950.0, ans=0.2 2024-08-21 05:42:58,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5107050.0, ans=0.125 2024-08-21 05:42:59,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5107050.0, ans=0.125 2024-08-21 05:43:12,813 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 26 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-21 05:43:34,975 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.51 vs. limit=22.5 2024-08-21 05:43:35,498 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 6950, loss[loss=0.1204, beats_loss=0.008619, ecapa_loss=0.0001542, whisper_loss=0.1103, over 20095.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01039, ecapa_loss=0.0001383, whisper_loss=0.09053, over 3843378.73 frames. ], batch size: 82, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:43:46,870 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 22 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-21 05:43:57,286 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 21 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-21 05:44:23,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=5107450.0, ans=0.015 2024-08-21 05:44:24,653 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 31 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-21 05:44:32,777 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.277e+01 2.529e+01 2.921e+01 4.469e+01, threshold=5.057e+01, percent-clipped=0.0 2024-08-21 05:45:06,284 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 7000, loss[loss=0.09519, beats_loss=0.01017, ecapa_loss=0.0001393, whisper_loss=0.08362, over 21756.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.0001383, whisper_loss=0.09036, over 3860123.73 frames. ], batch size: 88, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:45:21,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5107750.0, ans=0.125 2024-08-21 05:45:22,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=5107750.0, ans=15.0 2024-08-21 05:45:32,447 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 16 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-21 05:46:01,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=5107950.0, ans=0.0 2024-08-21 05:46:24,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5108150.0, ans=0.0 2024-08-21 05:46:38,981 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 7050, loss[loss=0.1084, beats_loss=0.009657, ecapa_loss=0.0001596, whisper_loss=0.09717, over 21070.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01036, ecapa_loss=0.0001383, whisper_loss=0.09083, over 3851800.42 frames. ], batch size: 86, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:46:39,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=5108250.0, ans=0.125 2024-08-21 05:46:56,214 WARNING [optim.py:496] (0/4) Scaling gradients by 0.049437928944826126, model_norm_threshold=50.57056427001953 2024-08-21 05:46:56,386 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.19, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.007e+05, grad_sumsq=1.862e+07, orig_rms_sq=1.077e-02 2024-08-21 05:47:04,296 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 28 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-21 05:47:19,374 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 21 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-21 05:47:26,628 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 15 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-21 05:47:34,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=5108550.0, ans=0.0 2024-08-21 05:47:34,953 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.310e+01 2.518e+01 2.782e+01 1.023e+03, threshold=5.036e+01, percent-clipped=2.0 2024-08-21 05:47:53,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=5108650.0, ans=15.0 2024-08-21 05:47:58,433 WARNING [optim.py:496] (0/4) Scaling gradients by 0.07700152695178986, model_norm_threshold=50.36127471923828 2024-08-21 05:47:58,607 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.694e+04, grad_sumsq=7.694e+04, orig_rms_sq=1.000e+00 2024-08-21 05:48:07,497 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 7100, loss[loss=0.1269, beats_loss=0.007958, ecapa_loss=0.0001536, whisper_loss=0.1174, over 22918.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01038, ecapa_loss=0.0001378, whisper_loss=0.09039, over 3816605.72 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:48:21,261 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.86 vs. limit=10.0 2024-08-21 05:48:24,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5108850.0, ans=0.1 2024-08-21 05:49:05,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=5109050.0, ans=0.2 2024-08-21 05:49:05,965 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=7.002e+01 2024-08-21 05:49:36,248 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 7150, loss[loss=0.096, beats_loss=0.009914, ecapa_loss=0.0001823, whisper_loss=0.08426, over 20545.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01043, ecapa_loss=0.000138, whisper_loss=0.09038, over 3824707.84 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:49:39,492 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.05 vs. limit=10.0 2024-08-21 05:49:52,546 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 23 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-21 05:49:59,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=5109350.0, ans=0.0 2024-08-21 05:50:21,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=5109450.0, ans=0.125 2024-08-21 05:50:32,479 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.665e+01 2.268e+01 2.455e+01 2.616e+01 6.540e+02, threshold=4.909e+01, percent-clipped=2.0 2024-08-21 05:50:36,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5109550.0, ans=0.1 2024-08-21 05:50:50,538 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 17 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-21 05:51:05,959 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 7200, loss[loss=0.1103, beats_loss=0.009739, ecapa_loss=0.0001438, whisper_loss=0.09913, over 22608.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01048, ecapa_loss=0.0001377, whisper_loss=0.08995, over 3825338.08 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:51:13,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=5109750.0, ans=15.0 2024-08-21 05:51:26,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=5109850.0, ans=0.125 2024-08-21 05:51:32,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5109850.0, ans=0.125 2024-08-21 05:51:42,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=5109950.0, ans=0.0 2024-08-21 05:51:47,926 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 21 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-21 05:51:49,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=5109950.0, ans=0.125 2024-08-21 05:51:49,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5109950.0, ans=0.1 2024-08-21 05:52:25,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=5110150.0, ans=0.125 2024-08-21 05:52:27,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5110150.0, ans=0.1 2024-08-21 05:52:27,966 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.02 vs. limit=15.0 2024-08-21 05:52:31,057 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.76 vs. limit=15.0 2024-08-21 05:52:37,643 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 7250, loss[loss=0.1092, beats_loss=0.01066, ecapa_loss=0.0001279, whisper_loss=0.09728, over 19607.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01043, ecapa_loss=0.0001384, whisper_loss=0.09014, over 3817364.23 frames. ], batch size: 79, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:53:17,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=5110450.0, ans=0.125 2024-08-21 05:53:25,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5110450.0, ans=0.1 2024-08-21 05:53:32,343 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 18 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-21 05:53:35,126 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.241e+01 2.447e+01 2.768e+01 8.311e+01, threshold=4.894e+01, percent-clipped=2.0 2024-08-21 05:53:36,061 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 23 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-21 05:53:45,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=5110550.0, ans=10.0 2024-08-21 05:53:54,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=5110650.0, ans=0.95 2024-08-21 05:54:07,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=5110750.0, ans=0.125 2024-08-21 05:54:07,923 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 7300, loss[loss=0.1027, beats_loss=0.01028, ecapa_loss=0.000133, whisper_loss=0.09105, over 19696.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01036, ecapa_loss=0.0001391, whisper_loss=0.09026, over 3814187.18 frames. ], batch size: 77, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:54:13,219 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 20 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-21 05:54:26,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5110850.0, ans=0.125 2024-08-21 05:54:51,907 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.54 vs. limit=15.0 2024-08-21 05:55:08,728 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 21 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-21 05:55:19,480 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 15 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-21 05:55:31,912 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 22 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-21 05:55:32,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=5111150.0, ans=0.0 2024-08-21 05:55:36,594 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 7350, loss[loss=0.1213, beats_loss=0.008263, ecapa_loss=0.000166, whisper_loss=0.1113, over 19481.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01035, ecapa_loss=0.0001391, whisper_loss=0.09038, over 3840613.72 frames. ], batch size: 78, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:55:48,057 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 30 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-21 05:55:51,836 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 30 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-21 05:56:09,868 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 20 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-21 05:56:25,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5111450.0, ans=0.125 2024-08-21 05:56:33,609 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.293e+01 2.578e+01 2.831e+01 4.096e+01, threshold=5.157e+01, percent-clipped=0.0 2024-08-21 05:56:33,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=5111550.0, ans=0.0 2024-08-21 05:56:39,336 INFO [train_multi_KD3.py:845] (0/4) A total of 95 cuts. 27 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-21 05:56:52,332 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.53 vs. limit=15.0 2024-08-21 05:56:53,596 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 29 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-21 05:57:01,931 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 9 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-21 05:57:04,678 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 7400, loss[loss=0.1032, beats_loss=0.009847, ecapa_loss=0.0001767, whisper_loss=0.09162, over 21169.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01035, ecapa_loss=0.0001394, whisper_loss=0.09006, over 3846915.72 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:57:23,431 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.489e+01 2024-08-21 05:57:35,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5111850.0, ans=0.1 2024-08-21 05:57:36,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=5111850.0, ans=0.125 2024-08-21 05:57:40,927 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08977154642343521, model_norm_threshold=51.56612014770508 2024-08-21 05:57:41,098 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.273e+04, grad_sumsq=4.273e+04, orig_rms_sq=1.000e+00 2024-08-21 05:57:59,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=5112050.0, ans=0.125 2024-08-21 05:58:07,108 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 22 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-21 05:58:34,087 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 7450, loss[loss=0.1238, beats_loss=0.008946, ecapa_loss=0.0001435, whisper_loss=0.1134, over 23585.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01035, ecapa_loss=0.0001401, whisper_loss=0.09037, over 3823114.17 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 05:58:47,631 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.741e+01 2024-08-21 05:59:05,254 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 13 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-21 05:59:07,065 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 24 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-21 05:59:23,321 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 19 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-21 05:59:31,714 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.315e+01 2.613e+01 3.029e+01 5.744e+02, threshold=5.226e+01, percent-clipped=1.0 2024-08-21 05:59:51,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=5112650.0, ans=0.125 2024-08-21 06:00:03,585 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 7500, loss[loss=0.1001, beats_loss=0.01262, ecapa_loss=0.0001366, whisper_loss=0.08613, over 21035.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01031, ecapa_loss=0.0001406, whisper_loss=0.09007, over 3819660.89 frames. ], batch size: 86, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:00:26,382 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 19 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-21 06:00:33,362 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 23 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-21 06:00:40,611 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-21 06:00:44,664 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 25 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-21 06:01:00,901 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2024-08-21 06:01:16,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5113150.0, ans=0.125 2024-08-21 06:01:17,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5113150.0, ans=0.125 2024-08-21 06:01:30,608 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.05 vs. limit=15.0 2024-08-21 06:01:33,393 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 21 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-21 06:01:34,459 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 7550, loss[loss=0.1022, beats_loss=0.01078, ecapa_loss=0.0001352, whisper_loss=0.09008, over 15391.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01035, ecapa_loss=0.00014, whisper_loss=0.08975, over 3819216.33 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 1.152921504606847e+18 2024-08-21 06:01:39,688 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-21 06:01:59,865 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 30 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-21 06:02:01,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5113350.0, ans=0.125 2024-08-21 06:02:23,632 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 30 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-21 06:02:36,283 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.241e+01 2.500e+01 2.791e+01 3.634e+01, threshold=5.000e+01, percent-clipped=0.0 2024-08-21 06:02:48,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5113650.0, ans=0.1 2024-08-21 06:03:05,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=5113650.0, ans=0.2 2024-08-21 06:03:07,992 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 7600, loss[loss=0.1028, beats_loss=0.009864, ecapa_loss=0.0001429, whisper_loss=0.09155, over 19869.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01039, ecapa_loss=0.0001403, whisper_loss=0.08961, over 3836114.20 frames. ], batch size: 81, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:03:44,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=5113950.0, ans=0.0 2024-08-21 06:03:51,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=5113950.0, ans=0.2 2024-08-21 06:03:53,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=5113950.0, ans=0.0 2024-08-21 06:04:01,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5113950.0, ans=0.1 2024-08-21 06:04:23,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=5114150.0, ans=0.0 2024-08-21 06:04:25,087 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 25 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-21 06:04:29,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5114150.0, ans=0.1 2024-08-21 06:04:37,245 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-21 06:04:37,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=5114150.0, ans=0.0 2024-08-21 06:04:37,733 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.03 vs. limit=12.0 2024-08-21 06:04:42,291 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 7650, loss[loss=0.08326, beats_loss=0.01121, ecapa_loss=0.0001315, whisper_loss=0.07073, over 21707.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01034, ecapa_loss=0.0001396, whisper_loss=0.0903, over 3839212.99 frames. ], batch size: 92, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:04:48,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5114250.0, ans=0.0 2024-08-21 06:04:50,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5114250.0, ans=0.125 2024-08-21 06:05:01,187 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 36 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-21 06:05:15,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5114350.0, ans=0.125 2024-08-21 06:05:32,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5114450.0, ans=0.1 2024-08-21 06:05:34,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5114450.0, ans=0.0 2024-08-21 06:05:42,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5114550.0, ans=0.1 2024-08-21 06:05:42,897 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.284e+01 2.478e+01 2.742e+01 4.351e+01, threshold=4.957e+01, percent-clipped=0.0 2024-08-21 06:05:47,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5114550.0, ans=0.1 2024-08-21 06:05:58,191 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 15 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-21 06:06:04,419 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.47 vs. limit=15.0 2024-08-21 06:06:13,471 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 7700, loss[loss=0.1009, beats_loss=0.01025, ecapa_loss=0.0001575, whisper_loss=0.08904, over 17285.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01035, ecapa_loss=0.0001389, whisper_loss=0.09019, over 3785189.57 frames. ], batch size: 72, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:06:17,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=5114750.0, ans=0.015 2024-08-21 06:06:37,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=5114850.0, ans=0.125 2024-08-21 06:06:40,176 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.28 vs. limit=15.0 2024-08-21 06:06:42,591 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=15.0 2024-08-21 06:06:44,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5114850.0, ans=0.125 2024-08-21 06:07:21,984 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 31 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-21 06:07:39,347 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-08-21 06:07:46,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=5115150.0, ans=0.0 2024-08-21 06:07:58,856 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 7750, loss[loss=0.09609, beats_loss=0.01301, ecapa_loss=0.0001025, whisper_loss=0.08206, over 23990.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01042, ecapa_loss=0.0001382, whisper_loss=0.08981, over 3798593.16 frames. ], batch size: 93, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:08:37,639 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 38 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-21 06:08:39,414 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 19 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-21 06:08:45,888 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.17 vs. limit=10.0 2024-08-21 06:08:58,428 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 30 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-21 06:09:01,795 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 17 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-21 06:09:03,079 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.270e+01 2.577e+01 2.902e+01 8.135e+01, threshold=5.155e+01, percent-clipped=1.0 2024-08-21 06:09:32,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5115650.0, ans=0.125 2024-08-21 06:09:34,806 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 7800, loss[loss=0.09429, beats_loss=0.009256, ecapa_loss=0.0001239, whisper_loss=0.08379, over 18120.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01042, ecapa_loss=0.0001375, whisper_loss=0.09, over 3788498.89 frames. ], batch size: 68, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:09:35,423 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 22 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-21 06:09:36,050 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.11 vs. limit=22.5 2024-08-21 06:10:19,090 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.53 vs. limit=6.0 2024-08-21 06:10:31,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5116050.0, ans=0.125 2024-08-21 06:10:39,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=5116050.0, ans=0.125 2024-08-21 06:10:43,378 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 21 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-21 06:11:06,791 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 18 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-21 06:11:10,104 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 7850, loss[loss=0.1047, beats_loss=0.009304, ecapa_loss=0.0001371, whisper_loss=0.09403, over 21211.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01049, ecapa_loss=0.0001372, whisper_loss=0.0892, over 3803052.57 frames. ], batch size: 83, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:11:15,399 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 27 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-21 06:11:45,893 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 22 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-21 06:12:02,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=5116550.0, ans=0.125 2024-08-21 06:12:04,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=5116550.0, ans=0.2 2024-08-21 06:12:08,965 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.692e+01 2.271e+01 2.419e+01 2.702e+01 3.999e+01, threshold=4.838e+01, percent-clipped=0.0 2024-08-21 06:12:11,515 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 31 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-21 06:12:11,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=5116550.0, ans=0.125 2024-08-21 06:12:13,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5116550.0, ans=0.1 2024-08-21 06:12:26,316 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.44 vs. limit=15.0 2024-08-21 06:12:36,704 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 25 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-21 06:12:40,403 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-08-21 06:12:41,173 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 7900, loss[loss=0.08109, beats_loss=0.0137, ecapa_loss=0.0001266, whisper_loss=0.06613, over 21715.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01048, ecapa_loss=0.0001381, whisper_loss=0.08964, over 3814113.70 frames. ], batch size: 92, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:13:01,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5116850.0, ans=0.125 2024-08-21 06:13:27,944 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 28 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-21 06:13:37,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5117050.0, ans=0.125 2024-08-21 06:13:42,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5117050.0, ans=0.0 2024-08-21 06:13:53,594 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.31 vs. limit=10.0 2024-08-21 06:14:03,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5117150.0, ans=0.125 2024-08-21 06:14:10,366 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 7950, loss[loss=0.1094, beats_loss=0.008889, ecapa_loss=0.0001618, whisper_loss=0.09891, over 18851.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01039, ecapa_loss=0.0001388, whisper_loss=0.09016, over 3827167.98 frames. ], batch size: 73, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:14:26,794 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.53 vs. limit=22.5 2024-08-21 06:14:37,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5117350.0, ans=0.125 2024-08-21 06:14:44,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5117450.0, ans=0.1 2024-08-21 06:14:51,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=5117450.0, ans=0.125 2024-08-21 06:14:54,219 WARNING [optim.py:496] (0/4) Scaling gradients by 0.07848511636257172, model_norm_threshold=48.37834167480469 2024-08-21 06:14:54,389 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.23, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.905e+04, grad_sumsq=8.269e+06, orig_rms_sq=1.077e-02 2024-08-21 06:15:06,209 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.583e+01 2.240e+01 2.515e+01 2.710e+01 6.164e+02, threshold=5.030e+01, percent-clipped=3.0 2024-08-21 06:15:11,417 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2024-08-21 06:15:23,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5117650.0, ans=0.1 2024-08-21 06:15:37,035 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 8000, loss[loss=0.1047, beats_loss=0.008035, ecapa_loss=0.0001752, whisper_loss=0.09491, over 12697.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0104, ecapa_loss=0.0001381, whisper_loss=0.09049, over 3824858.96 frames. ], batch size: 50, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:15:39,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=5117750.0, ans=0.125 2024-08-21 06:16:09,121 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-21 06:16:12,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5117950.0, ans=0.125 2024-08-21 06:16:13,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=5117950.0, ans=0.0 2024-08-21 06:16:19,911 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 18 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-21 06:16:33,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5118050.0, ans=0.125 2024-08-21 06:16:49,555 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 06:16:49,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5118150.0, ans=0.125 2024-08-21 06:16:54,758 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 22 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-21 06:16:56,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5118150.0, ans=0.125 2024-08-21 06:17:00,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=5118150.0, ans=0.125 2024-08-21 06:17:03,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=5118150.0, ans=0.2 2024-08-21 06:17:05,642 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 8050, loss[loss=0.1011, beats_loss=0.01299, ecapa_loss=0.0001348, whisper_loss=0.08678, over 21952.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01041, ecapa_loss=0.0001388, whisper_loss=0.09057, over 3817545.90 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:17:16,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5118250.0, ans=0.125 2024-08-21 06:17:30,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5118350.0, ans=0.1 2024-08-21 06:17:50,736 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.789e+00 2024-08-21 06:17:52,363 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 17 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-21 06:17:57,431 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-21 06:17:57,663 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.971e+05 2024-08-21 06:18:03,337 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.278e+01 2.668e+01 2.870e+01 8.505e+01, threshold=5.336e+01, percent-clipped=1.0 2024-08-21 06:18:12,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5118550.0, ans=0.125 2024-08-21 06:18:14,174 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.89 vs. limit=10.0 2024-08-21 06:18:21,679 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 22 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-21 06:18:32,400 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 26 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-21 06:18:34,978 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 8100, loss[loss=0.08994, beats_loss=0.014, ecapa_loss=0.0001207, whisper_loss=0.07474, over 17047.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01051, ecapa_loss=0.0001381, whisper_loss=0.09027, over 3855015.89 frames. ], batch size: 71, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:18:35,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=5118750.0, ans=0.125 2024-08-21 06:19:30,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5119050.0, ans=0.0 2024-08-21 06:19:33,889 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2024-08-21 06:19:39,256 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 31 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-21 06:19:48,670 WARNING [optim.py:496] (0/4) Scaling gradients by 0.08069697767496109, model_norm_threshold=53.36321258544922 2024-08-21 06:19:48,840 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.079e+04, grad_sumsq=7.079e+04, orig_rms_sq=1.000e+00 2024-08-21 06:19:54,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=5119150.0, ans=0.125 2024-08-21 06:20:02,410 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 8150, loss[loss=0.1011, beats_loss=0.01178, ecapa_loss=0.0001269, whisper_loss=0.08807, over 18992.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01049, ecapa_loss=0.0001377, whisper_loss=0.09037, over 3849145.12 frames. ], batch size: 76, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:20:09,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5119250.0, ans=0.0 2024-08-21 06:20:48,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=5119450.0, ans=0.0 2024-08-21 06:20:58,318 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.296e+01 2.463e+01 2.711e+01 6.613e+02, threshold=4.926e+01, percent-clipped=2.0 2024-08-21 06:21:04,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5119550.0, ans=0.125 2024-08-21 06:21:15,066 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 23 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-21 06:21:27,803 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 8200, loss[loss=0.0992, beats_loss=0.01298, ecapa_loss=0.0001022, whisper_loss=0.08519, over 20409.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0105, ecapa_loss=0.0001371, whisper_loss=0.09062, over 3849882.99 frames. ], batch size: 81, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:21:50,076 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 25 from LS+wenet, 9 from Vox, 29 fro AS 2024-08-21 06:22:09,868 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-512000.pt 2024-08-21 06:22:22,478 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 26 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-21 06:22:57,645 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 8250, loss[loss=0.113, beats_loss=0.01003, ecapa_loss=0.0001417, whisper_loss=0.1016, over 22080.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01046, ecapa_loss=0.0001373, whisper_loss=0.09134, over 3842648.70 frames. ], batch size: 88, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:22:59,226 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.72 vs. limit=15.0 2024-08-21 06:23:00,943 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.690e+00 2024-08-21 06:23:09,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5120250.0, ans=0.1 2024-08-21 06:23:33,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5120450.0, ans=0.0 2024-08-21 06:23:44,963 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-21 06:23:54,413 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.309e+01 2.543e+01 2.823e+01 1.094e+02, threshold=5.085e+01, percent-clipped=1.0 2024-08-21 06:24:25,029 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 8300, loss[loss=0.1015, beats_loss=0.01115, ecapa_loss=0.0001018, whisper_loss=0.08934, over 22761.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01046, ecapa_loss=0.0001371, whisper_loss=0.09039, over 3823029.63 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:24:48,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=5120850.0, ans=0.2 2024-08-21 06:24:58,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5120850.0, ans=0.1 2024-08-21 06:25:25,560 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 18 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-21 06:25:28,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=5121050.0, ans=0.125 2024-08-21 06:25:54,044 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.12 vs. limit=22.5 2024-08-21 06:25:56,027 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 8350, loss[loss=0.09243, beats_loss=0.01104, ecapa_loss=0.000162, whisper_loss=0.07978, over 21889.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01042, ecapa_loss=0.0001381, whisper_loss=0.09012, over 3818178.01 frames. ], batch size: 94, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:26:24,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5121350.0, ans=0.125 2024-08-21 06:26:27,032 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-21 06:26:51,535 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=6.21 vs. limit=12.0 2024-08-21 06:26:52,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=5121550.0, ans=0.0 2024-08-21 06:26:55,616 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0374552384018898, model_norm_threshold=50.851959228515625 2024-08-21 06:26:55,786 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.895e+05, grad_sumsq=1.757e+07, orig_rms_sq=1.078e-02 2024-08-21 06:26:59,729 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.288e+01 2.464e+01 2.782e+01 1.358e+03, threshold=4.928e+01, percent-clipped=2.0 2024-08-21 06:27:02,791 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.26 vs. limit=15.0 2024-08-21 06:27:12,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5121650.0, ans=0.125 2024-08-21 06:27:24,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=5121650.0, ans=0.125 2024-08-21 06:27:26,392 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 35 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-21 06:27:30,538 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 25 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-21 06:27:33,207 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 8400, loss[loss=0.1135, beats_loss=0.01001, ecapa_loss=0.0001792, whisper_loss=0.1017, over 21852.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01045, ecapa_loss=0.0001379, whisper_loss=0.08982, over 3839421.87 frames. ], batch size: 94, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:27:34,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=5121750.0, ans=0.07 2024-08-21 06:27:34,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5121750.0, ans=0.1 2024-08-21 06:27:38,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5121750.0, ans=0.125 2024-08-21 06:27:43,466 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 27 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-21 06:27:45,557 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 22 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-21 06:28:01,518 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.34 vs. limit=22.5 2024-08-21 06:28:37,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=5122050.0, ans=0.0 2024-08-21 06:28:46,985 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2024-08-21 06:28:59,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5122150.0, ans=0.0 2024-08-21 06:29:04,229 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-08-21 06:29:06,827 WARNING [optim.py:496] (0/4) Scaling gradients by 0.038237348198890686, model_norm_threshold=49.277313232421875 2024-08-21 06:29:06,996 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.0.norm.log_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.805e+05, grad_sumsq=1.805e+05, orig_rms_sq=1.000e+00 2024-08-21 06:29:07,039 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 8450, loss[loss=0.101, beats_loss=0.01068, ecapa_loss=0.0001406, whisper_loss=0.08892, over 16000.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01039, ecapa_loss=0.0001383, whisper_loss=0.08989, over 3819212.91 frames. ], batch size: 64, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:29:11,410 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-21 06:29:22,030 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.10 vs. limit=15.0 2024-08-21 06:29:38,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=5122350.0, ans=0.125 2024-08-21 06:30:11,890 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.430e+01 2.632e+01 3.117e+01 1.289e+03, threshold=5.264e+01, percent-clipped=4.0 2024-08-21 06:30:13,142 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 21 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-21 06:30:17,760 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.15 vs. limit=15.0 2024-08-21 06:30:32,828 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 21 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-21 06:30:39,366 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.67 vs. limit=22.5 2024-08-21 06:30:46,101 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 8500, loss[loss=0.1022, beats_loss=0.01011, ecapa_loss=0.0001568, whisper_loss=0.09048, over 22046.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01037, ecapa_loss=0.0001392, whisper_loss=0.08955, over 3814933.89 frames. ], batch size: 89, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:30:51,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5122750.0, ans=0.0 2024-08-21 06:30:57,147 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 17 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-21 06:31:10,882 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-21 06:31:15,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=5122850.0, ans=0.2 2024-08-21 06:31:19,054 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 28 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-21 06:31:46,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=5123050.0, ans=0.0 2024-08-21 06:31:59,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5123050.0, ans=0.125 2024-08-21 06:32:16,203 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.38 vs. limit=22.5 2024-08-21 06:32:19,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5123150.0, ans=0.125 2024-08-21 06:32:21,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=5123150.0, ans=0.125 2024-08-21 06:32:24,512 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 8550, loss[loss=0.1083, beats_loss=0.01023, ecapa_loss=0.0001015, whisper_loss=0.09709, over 18966.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01036, ecapa_loss=0.0001394, whisper_loss=0.08972, over 3802081.13 frames. ], batch size: 69, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:32:29,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=5123250.0, ans=0.0 2024-08-21 06:32:31,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=5123250.0, ans=0.0 2024-08-21 06:33:30,485 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.375e+01 2.634e+01 2.950e+01 1.431e+02, threshold=5.267e+01, percent-clipped=1.0 2024-08-21 06:33:35,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=5123550.0, ans=0.0 2024-08-21 06:33:37,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5123550.0, ans=0.125 2024-08-21 06:33:51,324 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 28 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-21 06:33:57,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5123650.0, ans=0.125 2024-08-21 06:34:02,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=5123650.0, ans=0.2 2024-08-21 06:34:04,709 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 8600, loss[loss=0.09493, beats_loss=0.009362, ecapa_loss=0.0001459, whisper_loss=0.0841, over 17459.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01029, ecapa_loss=0.000139, whisper_loss=0.09, over 3804191.49 frames. ], batch size: 69, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:34:21,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=5123750.0, ans=0.0 2024-08-21 06:34:24,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=5123850.0, ans=0.2 2024-08-21 06:34:26,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5123850.0, ans=0.125 2024-08-21 06:34:28,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5123850.0, ans=0.0 2024-08-21 06:34:34,105 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 26 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-21 06:34:36,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=5123850.0, ans=0.125 2024-08-21 06:34:54,435 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-21 06:35:38,242 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-21 06:35:40,468 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-21 06:35:43,141 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 8650, loss[loss=0.08737, beats_loss=0.01214, ecapa_loss=0.0001247, whisper_loss=0.07398, over 15265.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01032, ecapa_loss=0.0001387, whisper_loss=0.09004, over 3804346.32 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:35:52,989 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-21 06:35:55,044 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 14 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-21 06:35:58,773 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 22 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-21 06:36:08,089 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-21 06:36:19,099 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 28 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-21 06:36:21,438 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 21 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-21 06:36:31,779 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 19 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-21 06:36:33,861 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2024-08-21 06:36:47,623 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.320e+01 2.649e+01 2.917e+01 4.406e+01, threshold=5.297e+01, percent-clipped=0.0 2024-08-21 06:36:53,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5124550.0, ans=0.125 2024-08-21 06:36:57,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=5124550.0, ans=0.125 2024-08-21 06:37:18,893 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 29 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-21 06:37:21,526 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 35 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-21 06:37:24,442 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 8700, loss[loss=0.09302, beats_loss=0.009178, ecapa_loss=0.000121, whisper_loss=0.08263, over 14511.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01037, ecapa_loss=0.0001381, whisper_loss=0.08961, over 3829878.09 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:37:29,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=5124750.0, ans=0.2 2024-08-21 06:37:37,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5124750.0, ans=0.125 2024-08-21 06:37:40,904 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 20 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-21 06:37:52,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=5124850.0, ans=0.2 2024-08-21 06:37:54,865 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.54 vs. limit=15.0 2024-08-21 06:38:26,806 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 25 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-21 06:38:32,461 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 19 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-21 06:39:00,039 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 8750, loss[loss=0.1203, beats_loss=0.008473, ecapa_loss=0.0001718, whisper_loss=0.1101, over 21905.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01034, ecapa_loss=0.0001388, whisper_loss=0.08991, over 3815942.78 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:39:01,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=5125250.0, ans=0.2 2024-08-21 06:39:02,778 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 18 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-21 06:39:02,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5125250.0, ans=0.1 2024-08-21 06:39:03,204 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.52 vs. limit=10.0 2024-08-21 06:39:35,639 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-21 06:39:50,457 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 20 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-21 06:40:01,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=5125550.0, ans=0.2 2024-08-21 06:40:02,198 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.240e+01 2.524e+01 2.807e+01 1.444e+02, threshold=5.048e+01, percent-clipped=1.0 2024-08-21 06:40:26,147 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-21 06:40:32,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5125650.0, ans=0.0 2024-08-21 06:40:34,711 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 8800, loss[loss=0.1044, beats_loss=0.008453, ecapa_loss=0.0001389, whisper_loss=0.09458, over 13537.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01034, ecapa_loss=0.0001386, whisper_loss=0.09029, over 3806718.63 frames. ], batch size: 51, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:40:37,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=5125750.0, ans=0.2 2024-08-21 06:40:44,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=5125750.0, ans=0.07 2024-08-21 06:40:46,525 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 29 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-21 06:41:12,387 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 20 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-21 06:41:19,666 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 14 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-21 06:41:42,913 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.706e+00 2024-08-21 06:42:03,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5126250.0, ans=0.0 2024-08-21 06:42:03,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5126250.0, ans=0.125 2024-08-21 06:42:04,617 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 8850, loss[loss=0.1315, beats_loss=0.01019, ecapa_loss=0.0001571, whisper_loss=0.1198, over 15746.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01044, ecapa_loss=0.0001375, whisper_loss=0.08931, over 3759448.78 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:42:05,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5126250.0, ans=0.125 2024-08-21 06:42:09,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5126250.0, ans=0.125 2024-08-21 06:42:20,196 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 24 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-21 06:42:45,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=5126450.0, ans=0.0 2024-08-21 06:42:45,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5126450.0, ans=0.0 2024-08-21 06:43:07,186 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.85 vs. limit=15.0 2024-08-21 06:43:11,368 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.02 vs. limit=12.0 2024-08-21 06:43:11,602 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.228e+01 2.528e+01 2.793e+01 5.836e+01, threshold=5.056e+01, percent-clipped=1.0 2024-08-21 06:43:34,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5126650.0, ans=0.1 2024-08-21 06:43:45,885 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 8900, loss[loss=0.1139, beats_loss=0.008085, ecapa_loss=0.000171, whisper_loss=0.1041, over 16720.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01037, ecapa_loss=0.0001374, whisper_loss=0.08975, over 3784668.54 frames. ], batch size: 65, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:43:48,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=5126750.0, ans=0.05 2024-08-21 06:43:50,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=5126750.0, ans=0.0 2024-08-21 06:43:58,112 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-21 06:44:13,147 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.34 vs. limit=15.0 2024-08-21 06:44:20,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=5126850.0, ans=0.0 2024-08-21 06:45:17,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=5127250.0, ans=0.0 2024-08-21 06:45:18,666 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 8950, loss[loss=0.08737, beats_loss=0.01148, ecapa_loss=0.0001504, whisper_loss=0.07438, over 22109.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01043, ecapa_loss=0.0001377, whisper_loss=0.08965, over 3773108.58 frames. ], batch size: 96, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:45:21,997 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.94 vs. limit=15.0 2024-08-21 06:45:31,733 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2024-08-21 06:45:38,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=5127350.0, ans=0.125 2024-08-21 06:45:51,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5127350.0, ans=0.125 2024-08-21 06:46:22,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=5127550.0, ans=0.2 2024-08-21 06:46:23,664 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.641e+01 2.262e+01 2.428e+01 2.785e+01 3.880e+01, threshold=4.857e+01, percent-clipped=0.0 2024-08-21 06:46:29,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=5127550.0, ans=0.95 2024-08-21 06:46:52,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5127650.0, ans=0.1 2024-08-21 06:46:56,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=5127750.0, ans=0.125 2024-08-21 06:46:56,477 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2024-08-21 06:46:56,919 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 9000, loss[loss=0.08171, beats_loss=0.009958, ecapa_loss=0.000135, whisper_loss=0.0704, over 17298.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01047, ecapa_loss=0.0001381, whisper_loss=0.08953, over 3784805.23 frames. ], batch size: 68, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:46:56,920 INFO [train_multi_KD3.py:1140] (0/4) Computing validation loss 2024-08-21 06:47:34,667 INFO [train_multi_KD3.py:1150] (0/4) Epoch 35, validation on ASR_libri: loss=0.2538, beats_loss=0, ecapa_loss=0.0005065, whisper_loss=0.2487, over 931116.00 frames. 2024-08-21 06:47:57,453 INFO [train_multi_KD3.py:1150] (0/4) Epoch 35, validation on SV_voxceleb1: loss=0.003886, beats_loss=0, ecapa_loss=0.0003886, whisper_loss=0, over 944235.00 frames. 2024-08-21 06:49:39,509 INFO [train_multi_KD3.py:1150] (0/4) Epoch 35, validation on AT_audioset: loss=0.02296, beats_loss=0.02296, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-21 06:49:39,514 INFO [train_multi_KD3.py:1156] (0/4) Maximum memory allocated so far is 32775MB 2024-08-21 06:50:13,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5127950.0, ans=0.125 2024-08-21 06:50:17,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=5127950.0, ans=0.0 2024-08-21 06:50:29,571 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-21 06:50:36,234 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 22 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-21 06:50:48,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=5128150.0, ans=0.0 2024-08-21 06:50:48,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5128150.0, ans=0.125 2024-08-21 06:50:58,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5128150.0, ans=0.125 2024-08-21 06:50:58,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=5128150.0, ans=0.0 2024-08-21 06:51:03,382 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 25 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-21 06:51:08,178 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 9050, loss[loss=0.09182, beats_loss=0.01119, ecapa_loss=0.0001335, whisper_loss=0.0793, over 17569.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01052, ecapa_loss=0.0001376, whisper_loss=0.08866, over 3754521.86 frames. ], batch size: 70, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:51:27,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5128350.0, ans=0.1 2024-08-21 06:51:48,965 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 17 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-21 06:51:52,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5128450.0, ans=0.0 2024-08-21 06:52:09,612 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.24 vs. limit=22.5 2024-08-21 06:52:12,253 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.208e+01 2.419e+01 2.777e+01 1.932e+02, threshold=4.839e+01, percent-clipped=1.0 2024-08-21 06:52:16,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=5128550.0, ans=0.125 2024-08-21 06:52:32,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5128650.0, ans=0.125 2024-08-21 06:52:41,763 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 9100, loss[loss=0.08974, beats_loss=0.01313, ecapa_loss=0.0001227, whisper_loss=0.07538, over 15425.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01057, ecapa_loss=0.0001368, whisper_loss=0.08876, over 3773308.95 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:52:54,910 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 24 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-21 06:53:06,122 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 06:53:39,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5129050.0, ans=0.125 2024-08-21 06:53:40,311 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.59 vs. limit=15.0 2024-08-21 06:53:47,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5129050.0, ans=0.125 2024-08-21 06:54:03,211 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-21 06:54:05,289 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 19 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-21 06:54:15,657 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 9150, loss[loss=0.09591, beats_loss=0.01092, ecapa_loss=0.000129, whisper_loss=0.08371, over 16867.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01048, ecapa_loss=0.000137, whisper_loss=0.08952, over 3795174.75 frames. ], batch size: 68, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:54:27,079 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 34 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-21 06:54:32,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=5129350.0, ans=0.0 2024-08-21 06:54:38,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5129350.0, ans=0.125 2024-08-21 06:54:42,913 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 20 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-21 06:54:43,268 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.55 vs. limit=10.0 2024-08-21 06:54:45,086 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 18 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-21 06:55:16,375 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.222e+01 2.469e+01 2.826e+01 4.057e+01, threshold=4.939e+01, percent-clipped=0.0 2024-08-21 06:55:35,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=5129650.0, ans=0.09899494936611666 2024-08-21 06:55:46,854 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 9200, loss[loss=0.08064, beats_loss=0.01293, ecapa_loss=0.0001658, whisper_loss=0.06606, over 19955.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01049, ecapa_loss=0.0001374, whisper_loss=0.08914, over 3801440.02 frames. ], batch size: 88, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:55:59,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=5129750.0, ans=0.125 2024-08-21 06:56:13,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5129850.0, ans=0.125 2024-08-21 06:56:14,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=5129850.0, ans=0.95 2024-08-21 06:56:30,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5129950.0, ans=0.125 2024-08-21 06:56:32,641 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 24 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-21 06:56:46,830 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 16 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-21 06:57:08,130 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.68 vs. limit=15.0 2024-08-21 06:57:09,497 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 15 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-21 06:57:21,095 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 31 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-21 06:57:22,057 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 9250, loss[loss=0.1086, beats_loss=0.01076, ecapa_loss=0.0001562, whisper_loss=0.09631, over 22898.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01042, ecapa_loss=0.000138, whisper_loss=0.09, over 3800863.32 frames. ], batch size: 94, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:57:23,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5130250.0, ans=0.1 2024-08-21 06:58:01,745 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 35 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-21 06:58:07,188 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 31 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-21 06:58:18,426 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 36 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-21 06:58:20,693 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 24 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-21 06:58:23,652 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.254e+01 2.574e+01 2.946e+01 4.918e+02, threshold=5.149e+01, percent-clipped=3.0 2024-08-21 06:58:32,175 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-08-21 06:58:40,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5130650.0, ans=0.125 2024-08-21 06:58:43,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=5130650.0, ans=0.0 2024-08-21 06:58:45,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5130650.0, ans=0.125 2024-08-21 06:58:49,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5130650.0, ans=0.125 2024-08-21 06:58:54,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=5130650.0, ans=0.2 2024-08-21 06:58:58,621 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 9300, loss[loss=0.09207, beats_loss=0.007773, ecapa_loss=0.0001563, whisper_loss=0.08274, over 13602.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01066, ecapa_loss=0.000138, whisper_loss=0.09041, over 3806935.20 frames. ], batch size: 52, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 06:59:03,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=5130750.0, ans=0.125 2024-08-21 06:59:11,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5130750.0, ans=0.1 2024-08-21 06:59:24,169 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 26 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-21 06:59:28,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5130850.0, ans=0.1 2024-08-21 06:59:31,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5130850.0, ans=0.1 2024-08-21 06:59:57,513 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2024-08-21 07:00:03,396 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 33 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-21 07:00:08,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5131050.0, ans=0.125 2024-08-21 07:00:12,614 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 22 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-21 07:00:14,599 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 20 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-21 07:00:33,903 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 9350, loss[loss=0.07892, beats_loss=0.01301, ecapa_loss=0.0001098, whisper_loss=0.06481, over 20305.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01061, ecapa_loss=0.0001379, whisper_loss=0.09042, over 3812762.56 frames. ], batch size: 79, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:00:38,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5131250.0, ans=0.125 2024-08-21 07:00:53,384 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 21 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-21 07:01:13,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5131450.0, ans=0.125 2024-08-21 07:01:19,548 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 32 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-21 07:01:22,399 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.67 vs. limit=15.0 2024-08-21 07:01:35,567 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.010e+01 2.276e+01 2.548e+01 2.859e+01 2.021e+02, threshold=5.096e+01, percent-clipped=1.0 2024-08-21 07:01:44,588 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.91 vs. limit=10.0 2024-08-21 07:01:47,605 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=6.234e+00 2024-08-21 07:01:49,973 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.43 vs. limit=15.0 2024-08-21 07:02:00,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=5131650.0, ans=0.2 2024-08-21 07:02:07,480 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 9400, loss[loss=0.1073, beats_loss=0.009161, ecapa_loss=0.0001567, whisper_loss=0.09655, over 22819.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01053, ecapa_loss=0.0001391, whisper_loss=0.09031, over 3822147.99 frames. ], batch size: 89, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:02:09,994 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 36 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-21 07:02:23,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5131750.0, ans=0.125 2024-08-21 07:02:28,718 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 24 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-21 07:02:41,562 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 32 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-21 07:02:49,424 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-08-21 07:03:06,462 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.82 vs. limit=5.0 2024-08-21 07:03:12,733 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 22 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-21 07:03:20,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=5132150.0, ans=0.0 2024-08-21 07:03:31,246 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 19 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-21 07:03:33,253 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 23 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-21 07:03:40,018 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 9450, loss[loss=0.1006, beats_loss=0.009687, ecapa_loss=0.0001346, whisper_loss=0.08954, over 19202.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01056, ecapa_loss=0.0001388, whisper_loss=0.08932, over 3838534.56 frames. ], batch size: 77, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:03:46,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5132250.0, ans=0.125 2024-08-21 07:04:03,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=5132350.0, ans=0.125 2024-08-21 07:04:12,676 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 14 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-21 07:04:40,569 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.680e+01 2.244e+01 2.507e+01 2.864e+01 1.489e+02, threshold=5.014e+01, percent-clipped=2.0 2024-08-21 07:04:45,703 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 23 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-21 07:04:46,252 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.02 vs. limit=15.0 2024-08-21 07:05:09,207 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.832e+05 2024-08-21 07:05:13,303 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 9500, loss[loss=0.1347, beats_loss=0.007993, ecapa_loss=0.0001514, whisper_loss=0.1252, over 20608.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01049, ecapa_loss=0.0001392, whisper_loss=0.08983, over 3835559.96 frames. ], batch size: 83, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:05:23,397 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 23 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-21 07:05:36,346 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 25 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-21 07:05:37,171 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.43 vs. limit=15.0 2024-08-21 07:06:02,591 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-21 07:06:04,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=5132950.0, ans=0.125 2024-08-21 07:06:06,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5132950.0, ans=0.1 2024-08-21 07:06:19,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=5133050.0, ans=0.0 2024-08-21 07:06:22,012 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.18 vs. limit=15.0 2024-08-21 07:06:31,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5133150.0, ans=0.125 2024-08-21 07:06:47,324 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 9550, loss[loss=0.08504, beats_loss=0.01345, ecapa_loss=0.0001109, whisper_loss=0.07048, over 18807.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01051, ecapa_loss=0.0001386, whisper_loss=0.08943, over 3820007.75 frames. ], batch size: 77, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:07:05,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5133350.0, ans=0.0 2024-08-21 07:07:21,935 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 14 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-21 07:07:27,857 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.62 vs. limit=15.0 2024-08-21 07:07:49,697 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.309e+01 2.529e+01 2.824e+01 3.800e+01, threshold=5.057e+01, percent-clipped=0.0 2024-08-21 07:08:19,679 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 9600, loss[loss=0.09053, beats_loss=0.007629, ecapa_loss=0.0001826, whisper_loss=0.08107, over 14212.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0104, ecapa_loss=0.0001392, whisper_loss=0.08958, over 3789140.56 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:08:33,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5133750.0, ans=0.125 2024-08-21 07:08:55,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=5133950.0, ans=0.0 2024-08-21 07:09:06,989 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 23 from LS+wenet, 31 from Vox, 28 fro AS 2024-08-21 07:09:12,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5134050.0, ans=0.1 2024-08-21 07:09:12,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=5134050.0, ans=0.125 2024-08-21 07:09:21,129 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-21 07:09:35,379 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 18 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-21 07:09:48,710 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 9650, loss[loss=0.09277, beats_loss=0.006872, ecapa_loss=0.0001317, whisper_loss=0.08458, over 14166.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01037, ecapa_loss=0.0001398, whisper_loss=0.08914, over 3782029.53 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:09:54,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5134250.0, ans=0.0 2024-08-21 07:10:05,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=5134350.0, ans=0.0 2024-08-21 07:10:05,934 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.32 vs. limit=15.0 2024-08-21 07:10:31,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5134450.0, ans=0.125 2024-08-21 07:10:49,971 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.720e+01 2.246e+01 2.506e+01 2.851e+01 2.599e+02, threshold=5.012e+01, percent-clipped=4.0 2024-08-21 07:10:51,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=5134550.0, ans=0.0 2024-08-21 07:11:19,417 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 9700, loss[loss=0.09572, beats_loss=0.01027, ecapa_loss=0.0001946, whisper_loss=0.0835, over 14243.00 frames. ], tot_loss[loss=0.1, beats_loss=0.0104, ecapa_loss=0.0001395, whisper_loss=0.08823, over 3771493.83 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:11:45,908 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 19 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-21 07:12:05,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=5134950.0, ans=0.125 2024-08-21 07:12:18,028 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-21 07:12:50,851 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 9750, loss[loss=0.1044, beats_loss=0.0114, ecapa_loss=0.0001354, whisper_loss=0.09166, over 23166.00 frames. ], tot_loss[loss=0.0994, beats_loss=0.01044, ecapa_loss=0.0001406, whisper_loss=0.08756, over 3756205.23 frames. ], batch size: 94, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:12:58,329 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 25 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-21 07:13:12,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5135350.0, ans=0.1 2024-08-21 07:13:24,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=5135350.0, ans=0.125 2024-08-21 07:13:30,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5135450.0, ans=0.0 2024-08-21 07:13:44,164 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 20 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-21 07:13:52,639 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.258e+01 2.458e+01 2.685e+01 1.396e+02, threshold=4.917e+01, percent-clipped=1.0 2024-08-21 07:13:58,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=5135550.0, ans=0.2 2024-08-21 07:14:20,951 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 9800, loss[loss=0.0987, beats_loss=0.01015, ecapa_loss=0.0001189, whisper_loss=0.08737, over 22687.00 frames. ], tot_loss[loss=0.0999, beats_loss=0.01039, ecapa_loss=0.0001402, whisper_loss=0.0881, over 3759155.68 frames. ], batch size: 87, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:14:45,902 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 33 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-21 07:14:53,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5135850.0, ans=0.125 2024-08-21 07:14:55,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=5135950.0, ans=0.09899494936611666 2024-08-21 07:15:32,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=5136050.0, ans=0.2 2024-08-21 07:15:42,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5136150.0, ans=0.0 2024-08-21 07:15:47,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5136150.0, ans=0.1 2024-08-21 07:15:49,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5136150.0, ans=0.1 2024-08-21 07:15:51,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5136150.0, ans=0.1 2024-08-21 07:15:54,757 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 9850, loss[loss=0.1121, beats_loss=0.009314, ecapa_loss=0.0001161, whisper_loss=0.1016, over 24894.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01039, ecapa_loss=0.0001385, whisper_loss=0.08859, over 3793027.93 frames. ], batch size: 94, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:15:59,598 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-21 07:16:01,729 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 12 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-21 07:16:04,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5136250.0, ans=0.0 2024-08-21 07:16:10,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=5136250.0, ans=0.2 2024-08-21 07:16:14,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=5136350.0, ans=0.0 2024-08-21 07:16:23,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5136350.0, ans=0.125 2024-08-21 07:16:34,801 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 24 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-21 07:16:54,355 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 16 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-21 07:16:54,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=5136550.0, ans=0.125 2024-08-21 07:16:56,986 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.08 vs. limit=15.0 2024-08-21 07:17:00,769 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.250e+01 2.454e+01 2.726e+01 7.431e+01, threshold=4.908e+01, percent-clipped=3.0 2024-08-21 07:17:23,245 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 14 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-21 07:17:26,948 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 12 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-21 07:17:33,747 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 9900, loss[loss=0.0984, beats_loss=0.009185, ecapa_loss=0.000137, whisper_loss=0.08785, over 16812.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01045, ecapa_loss=0.0001385, whisper_loss=0.08862, over 3801733.10 frames. ], batch size: 67, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:17:36,226 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-21 07:17:44,354 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.11 vs. limit=15.0 2024-08-21 07:18:04,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=5136850.0, ans=0.0 2024-08-21 07:18:33,604 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 17 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-21 07:18:34,155 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2024-08-21 07:18:35,295 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 34 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-21 07:18:39,850 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.71 vs. limit=15.0 2024-08-21 07:19:07,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5137250.0, ans=0.125 2024-08-21 07:19:07,789 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 9950, loss[loss=0.1012, beats_loss=0.01264, ecapa_loss=0.0001158, whisper_loss=0.08737, over 21996.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01052, ecapa_loss=0.0001366, whisper_loss=0.08901, over 3812078.09 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:19:18,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5137250.0, ans=0.125 2024-08-21 07:19:34,469 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 19 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-21 07:19:36,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=5137350.0, ans=0.2 2024-08-21 07:19:40,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=5137350.0, ans=0.0 2024-08-21 07:19:45,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5137450.0, ans=0.125 2024-08-21 07:20:11,422 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.331e+01 2.493e+01 2.737e+01 3.742e+01, threshold=4.986e+01, percent-clipped=0.0 2024-08-21 07:20:17,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=5137550.0, ans=0.0 2024-08-21 07:20:28,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5137650.0, ans=0.125 2024-08-21 07:20:40,315 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 10000, loss[loss=0.09754, beats_loss=0.009566, ecapa_loss=0.0001759, whisper_loss=0.08622, over 20466.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01049, ecapa_loss=0.0001366, whisper_loss=0.08968, over 3779122.89 frames. ], batch size: 83, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:20:40,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5137750.0, ans=0.1 2024-08-21 07:20:59,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=5137850.0, ans=0.0 2024-08-21 07:21:13,004 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=15.0 2024-08-21 07:21:25,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5137950.0, ans=0.125 2024-08-21 07:22:00,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=5138150.0, ans=0.2 2024-08-21 07:22:05,126 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 23 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-21 07:22:14,657 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 10050, loss[loss=0.111, beats_loss=0.01163, ecapa_loss=0.0001354, whisper_loss=0.09799, over 20122.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01054, ecapa_loss=0.0001367, whisper_loss=0.08941, over 3782650.00 frames. ], batch size: 78, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:22:42,020 WARNING [optim.py:496] (0/4) Scaling gradients by 0.01775754615664482, model_norm_threshold=49.858680725097656 2024-08-21 07:22:42,190 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.23, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.817e+06, grad_sumsq=1.684e+08, orig_rms_sq=1.079e-02 2024-08-21 07:23:01,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=5138450.0, ans=0.07 2024-08-21 07:23:05,243 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 19 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-21 07:23:09,508 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 33 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-21 07:23:19,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=5138550.0, ans=0.035 2024-08-21 07:23:21,688 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-21 07:23:27,122 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.254e+01 2.564e+01 3.028e+01 2.808e+03, threshold=5.129e+01, percent-clipped=1.0 2024-08-21 07:23:39,211 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 07:24:02,643 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 10100, loss[loss=0.09275, beats_loss=0.0115, ecapa_loss=0.0001403, whisper_loss=0.07985, over 21936.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01053, ecapa_loss=0.0001371, whisper_loss=0.08947, over 3788609.85 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:24:25,916 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 22 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-21 07:24:28,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=5138850.0, ans=0.025 2024-08-21 07:24:28,741 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.90 vs. limit=15.0 2024-08-21 07:24:30,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=5138850.0, ans=0.0 2024-08-21 07:24:37,932 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 22 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-21 07:24:54,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5138950.0, ans=0.125 2024-08-21 07:25:02,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=5139050.0, ans=0.0 2024-08-21 07:25:36,847 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 10150, loss[loss=0.1087, beats_loss=0.009695, ecapa_loss=0.0001417, whisper_loss=0.09756, over 23179.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01055, ecapa_loss=0.0001374, whisper_loss=0.08917, over 3805142.09 frames. ], batch size: 92, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:25:43,888 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.91 vs. limit=15.0 2024-08-21 07:25:48,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=5139250.0, ans=0.0 2024-08-21 07:25:51,038 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 27 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-21 07:26:08,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5139350.0, ans=0.0 2024-08-21 07:26:10,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5139350.0, ans=0.125 2024-08-21 07:26:12,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=5139350.0, ans=0.02 2024-08-21 07:26:13,910 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 32 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-21 07:26:19,203 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 18 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-21 07:26:44,483 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.296e+01 2.505e+01 2.874e+01 3.996e+01, threshold=5.010e+01, percent-clipped=0.0 2024-08-21 07:27:05,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=5139650.0, ans=0.05 2024-08-21 07:27:11,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=5139650.0, ans=0.025 2024-08-21 07:27:12,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5139650.0, ans=0.125 2024-08-21 07:27:15,371 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 10200, loss[loss=0.1042, beats_loss=0.0112, ecapa_loss=0.0001481, whisper_loss=0.09153, over 23218.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01051, ecapa_loss=0.0001368, whisper_loss=0.08945, over 3800928.81 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:27:16,318 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-21 07:27:46,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=5139850.0, ans=10.0 2024-08-21 07:27:58,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5139950.0, ans=0.1 2024-08-21 07:28:07,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=5139950.0, ans=0.125 2024-08-21 07:28:34,988 WARNING [optim.py:496] (0/4) Scaling gradients by 0.040334705263376236, model_norm_threshold=50.09689712524414 2024-08-21 07:28:35,158 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.841e+05, grad_sumsq=1.841e+05, orig_rms_sq=1.000e+00 2024-08-21 07:28:35,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=5140150.0, ans=0.125 2024-08-21 07:28:44,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5140150.0, ans=0.125 2024-08-21 07:28:46,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=5140150.0, ans=0.07 2024-08-21 07:28:50,994 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 10250, loss[loss=0.101, beats_loss=0.01072, ecapa_loss=0.000144, whisper_loss=0.08882, over 22385.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01049, ecapa_loss=0.0001377, whisper_loss=0.08948, over 3800535.63 frames. ], batch size: 92, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:29:11,930 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 18 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-21 07:29:17,961 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.20 vs. limit=15.0 2024-08-21 07:29:39,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=5140450.0, ans=0.0 2024-08-21 07:29:43,462 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.94 vs. limit=15.0 2024-08-21 07:29:53,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5140550.0, ans=0.1 2024-08-21 07:29:56,024 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.043e+01 2.288e+01 2.559e+01 2.960e+01 1.242e+03, threshold=5.118e+01, percent-clipped=2.0 2024-08-21 07:29:57,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=5140550.0, ans=0.125 2024-08-21 07:30:08,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5140650.0, ans=0.0 2024-08-21 07:30:17,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=5140650.0, ans=0.0 2024-08-21 07:30:28,603 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 10300, loss[loss=0.07591, beats_loss=0.01325, ecapa_loss=0.0001058, whisper_loss=0.0616, over 13246.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01044, ecapa_loss=0.0001374, whisper_loss=0.08985, over 3797195.93 frames. ], batch size: 52, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:30:31,216 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 14 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-21 07:31:28,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=5140950.0, ans=0.2 2024-08-21 07:31:51,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=5141050.0, ans=0.0 2024-08-21 07:31:55,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=5141050.0, ans=0.125 2024-08-21 07:31:58,790 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.77 vs. limit=15.0 2024-08-21 07:32:24,716 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 10350, loss[loss=0.1177, beats_loss=0.009783, ecapa_loss=0.0001384, whisper_loss=0.1066, over 23909.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01044, ecapa_loss=0.0001376, whisper_loss=0.08999, over 3793322.87 frames. ], batch size: 96, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:32:26,334 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.67 vs. limit=15.0 2024-08-21 07:32:42,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=5141250.0, ans=0.125 2024-08-21 07:32:42,331 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.50 vs. limit=15.0 2024-08-21 07:32:52,041 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-21 07:33:14,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=5141450.0, ans=0.125 2024-08-21 07:33:20,348 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 20 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-21 07:33:28,394 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 07:33:35,175 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.267e+01 2.630e+01 2.969e+01 5.000e+01, threshold=5.261e+01, percent-clipped=0.0 2024-08-21 07:33:41,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=5141550.0, ans=0.1 2024-08-21 07:33:48,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5141650.0, ans=0.125 2024-08-21 07:33:49,660 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 34 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-21 07:33:49,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5141650.0, ans=0.0 2024-08-21 07:33:49,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=5141650.0, ans=0.125 2024-08-21 07:34:05,201 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 23 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-21 07:34:08,029 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 10400, loss[loss=0.09381, beats_loss=0.01139, ecapa_loss=0.0001473, whisper_loss=0.08095, over 22815.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01039, ecapa_loss=0.0001383, whisper_loss=0.09008, over 3767731.08 frames. ], batch size: 94, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:34:09,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5141750.0, ans=0.125 2024-08-21 07:34:32,149 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.68 vs. limit=6.0 2024-08-21 07:34:44,518 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2024-08-21 07:34:48,386 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 15 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-21 07:35:03,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=5141950.0, ans=0.2 2024-08-21 07:35:05,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5141950.0, ans=0.125 2024-08-21 07:35:11,776 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2024-08-21 07:35:29,440 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 13 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-21 07:35:42,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=5142150.0, ans=0.0 2024-08-21 07:35:54,747 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 10450, loss[loss=0.08708, beats_loss=0.0106, ecapa_loss=0.0001624, whisper_loss=0.07486, over 21898.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01039, ecapa_loss=0.0001389, whisper_loss=0.09008, over 3790612.89 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:35:54,966 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 26 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-21 07:35:57,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5142250.0, ans=0.125 2024-08-21 07:36:47,659 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-21 07:37:18,065 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.371e+01 2.736e+01 3.043e+01 5.041e+02, threshold=5.472e+01, percent-clipped=3.0 2024-08-21 07:37:24,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=5142550.0, ans=0.2 2024-08-21 07:37:35,270 INFO [train_multi_KD3.py:845] (0/4) A total of 96 cuts. 29 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-21 07:37:49,108 INFO [train_multi_KD3.py:845] (0/4) A total of 57 cuts. 20 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-21 07:37:53,570 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 10500, loss[loss=0.09286, beats_loss=0.01093, ecapa_loss=0.0001444, whisper_loss=0.08049, over 17947.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01039, ecapa_loss=0.0001388, whisper_loss=0.08986, over 3806258.32 frames. ], batch size: 75, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:38:00,289 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-21 07:38:00,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5142750.0, ans=0.125 2024-08-21 07:38:02,362 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 28 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-21 07:38:14,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=5142750.0, ans=0.125 2024-08-21 07:38:16,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5142850.0, ans=0.125 2024-08-21 07:38:38,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5142850.0, ans=0.1 2024-08-21 07:38:49,405 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 26 from LS+wenet, 18 from Vox, 14 fro AS 2024-08-21 07:38:51,260 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 20 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-21 07:39:07,169 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 16 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-21 07:39:31,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=5143150.0, ans=0.95 2024-08-21 07:39:41,922 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 10550, loss[loss=0.07824, beats_loss=0.01084, ecapa_loss=0.0001356, whisper_loss=0.06604, over 13564.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01038, ecapa_loss=0.0001392, whisper_loss=0.09013, over 3819556.68 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:40:09,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=5143350.0, ans=0.2 2024-08-21 07:40:20,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=5143350.0, ans=0.125 2024-08-21 07:40:22,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5143450.0, ans=0.1 2024-08-21 07:40:43,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5143550.0, ans=0.1 2024-08-21 07:40:50,835 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.310e+01 2.474e+01 2.751e+01 3.009e+02, threshold=4.947e+01, percent-clipped=3.0 2024-08-21 07:40:55,912 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.73 vs. limit=15.0 2024-08-21 07:41:18,307 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 25 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-21 07:41:21,521 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 10600, loss[loss=0.08496, beats_loss=0.01201, ecapa_loss=0.0001367, whisper_loss=0.07158, over 19181.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01039, ecapa_loss=0.0001398, whisper_loss=0.09016, over 3812977.37 frames. ], batch size: 76, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:41:26,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=5143750.0, ans=0.0 2024-08-21 07:41:30,456 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.56 vs. limit=12.0 2024-08-21 07:41:59,455 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 07:42:11,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5143950.0, ans=0.125 2024-08-21 07:42:46,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=5144150.0, ans=0.0 2024-08-21 07:42:55,663 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 10650, loss[loss=0.1139, beats_loss=0.008012, ecapa_loss=0.0001595, whisper_loss=0.1043, over 21799.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01037, ecapa_loss=0.0001398, whisper_loss=0.09032, over 3810438.40 frames. ], batch size: 88, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:44:04,173 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.276e+01 2.540e+01 2.903e+01 1.576e+02, threshold=5.081e+01, percent-clipped=1.0 2024-08-21 07:44:25,546 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.58 vs. limit=15.0 2024-08-21 07:44:33,454 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 10700, loss[loss=0.1022, beats_loss=0.008233, ecapa_loss=0.0001375, whisper_loss=0.09256, over 13872.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01048, ecapa_loss=0.0001385, whisper_loss=0.08949, over 3822471.53 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:44:38,437 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 32 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-21 07:44:42,839 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 36 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-21 07:44:47,754 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-21 07:45:42,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=5145050.0, ans=0.0 2024-08-21 07:45:51,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5145150.0, ans=0.0 2024-08-21 07:46:10,562 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 10750, loss[loss=0.08414, beats_loss=0.01215, ecapa_loss=0.0001657, whisper_loss=0.07034, over 19890.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.0105, ecapa_loss=0.0001389, whisper_loss=0.0889, over 3800971.81 frames. ], batch size: 84, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:46:26,571 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.57 vs. limit=15.0 2024-08-21 07:46:30,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5145350.0, ans=0.125 2024-08-21 07:46:53,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=5145450.0, ans=0.0 2024-08-21 07:46:59,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=5145450.0, ans=0.2 2024-08-21 07:47:05,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=5145450.0, ans=0.0 2024-08-21 07:47:07,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5145550.0, ans=0.125 2024-08-21 07:47:16,683 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.270e+01 2.527e+01 2.757e+01 4.165e+01, threshold=5.054e+01, percent-clipped=0.0 2024-08-21 07:47:36,584 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 36 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-21 07:47:42,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=5145650.0, ans=0.0 2024-08-21 07:47:43,084 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.27 vs. limit=15.0 2024-08-21 07:47:47,531 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 10800, loss[loss=0.1024, beats_loss=0.01069, ecapa_loss=0.0001259, whisper_loss=0.0905, over 20324.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01051, ecapa_loss=0.0001383, whisper_loss=0.0888, over 3819970.37 frames. ], batch size: 77, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:48:05,810 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 9 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-21 07:48:11,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=5145850.0, ans=0.05 2024-08-21 07:48:33,254 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.93 vs. limit=22.5 2024-08-21 07:48:42,931 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2024-08-21 07:48:50,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5146050.0, ans=0.125 2024-08-21 07:48:57,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5146050.0, ans=0.125 2024-08-21 07:48:59,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5146050.0, ans=0.125 2024-08-21 07:49:14,353 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 29 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-21 07:49:17,105 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.10 vs. limit=15.0 2024-08-21 07:49:20,940 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 10850, loss[loss=0.1221, beats_loss=0.008271, ecapa_loss=0.0001353, whisper_loss=0.1124, over 23362.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01048, ecapa_loss=0.0001374, whisper_loss=0.08903, over 3804220.72 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:49:21,734 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 28 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-21 07:49:46,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=5146350.0, ans=0.125 2024-08-21 07:49:55,485 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 16 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-21 07:49:55,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=5146350.0, ans=0.2 2024-08-21 07:50:12,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=5146450.0, ans=0.125 2024-08-21 07:50:15,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=5146550.0, ans=0.125 2024-08-21 07:50:23,460 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.344e+01 2.543e+01 2.878e+01 8.431e+01, threshold=5.085e+01, percent-clipped=1.0 2024-08-21 07:50:43,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5146650.0, ans=0.125 2024-08-21 07:50:49,931 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-21 07:50:52,686 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 10900, loss[loss=0.1225, beats_loss=0.009941, ecapa_loss=0.0001338, whisper_loss=0.1112, over 22141.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.0105, ecapa_loss=0.0001382, whisper_loss=0.08874, over 3806272.23 frames. ], batch size: 86, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:51:00,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5146750.0, ans=0.125 2024-08-21 07:51:04,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5146750.0, ans=0.125 2024-08-21 07:51:09,781 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 17 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-21 07:52:23,186 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 10950, loss[loss=0.1055, beats_loss=0.01259, ecapa_loss=0.000132, whisper_loss=0.09161, over 23981.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01046, ecapa_loss=0.0001383, whisper_loss=0.08982, over 3808657.90 frames. ], batch size: 95, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:52:26,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=5147250.0, ans=0.125 2024-08-21 07:52:37,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5147250.0, ans=0.1 2024-08-21 07:52:53,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5147350.0, ans=0.125 2024-08-21 07:52:55,427 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-21 07:53:22,469 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.337e+01 2.519e+01 2.828e+01 1.066e+02, threshold=5.038e+01, percent-clipped=2.0 2024-08-21 07:53:27,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=5147550.0, ans=0.2 2024-08-21 07:53:29,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5147550.0, ans=0.0 2024-08-21 07:53:48,457 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.50 vs. limit=22.5 2024-08-21 07:53:52,847 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 11000, loss[loss=0.1048, beats_loss=0.01233, ecapa_loss=0.0001202, whisper_loss=0.09125, over 20938.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01042, ecapa_loss=0.0001389, whisper_loss=0.09026, over 3773040.91 frames. ], batch size: 82, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:54:15,121 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 27 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-21 07:54:16,912 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 19 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-21 07:54:18,869 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 35 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-21 07:54:31,055 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.44 vs. limit=12.0 2024-08-21 07:54:34,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=5147950.0, ans=0.0 2024-08-21 07:54:39,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=5147950.0, ans=0.125 2024-08-21 07:54:40,518 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.98 vs. limit=5.0 2024-08-21 07:54:42,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=5147950.0, ans=0.125 2024-08-21 07:54:54,145 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 22 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-21 07:55:21,764 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 11050, loss[loss=0.115, beats_loss=0.008109, ecapa_loss=0.0001552, whisper_loss=0.1054, over 17586.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01036, ecapa_loss=0.0001401, whisper_loss=0.09111, over 3809806.69 frames. ], batch size: 68, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:55:21,973 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 14 from LS+wenet, 21 from Vox, 16 fro AS 2024-08-21 07:55:23,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5148250.0, ans=0.125 2024-08-21 07:55:34,888 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.52 vs. limit=10.0 2024-08-21 07:55:45,332 INFO [train_multi_KD3.py:845] (0/4) A total of 84 cuts. 27 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-21 07:55:56,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=5148450.0, ans=0.09899494936611666 2024-08-21 07:56:00,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5148450.0, ans=0.0 2024-08-21 07:56:06,526 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 19 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-21 07:56:18,175 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 28 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-21 07:56:19,086 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.796e+01 2.275e+01 2.484e+01 2.790e+01 7.658e+01, threshold=4.968e+01, percent-clipped=1.0 2024-08-21 07:56:27,298 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 36 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-21 07:56:48,757 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 11100, loss[loss=0.08643, beats_loss=0.01128, ecapa_loss=0.000145, whisper_loss=0.0737, over 19967.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0104, ecapa_loss=0.0001403, whisper_loss=0.09076, over 3818434.36 frames. ], batch size: 81, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:57:07,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5148850.0, ans=0.1 2024-08-21 07:57:19,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=5148850.0, ans=0.0 2024-08-21 07:57:23,105 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 22 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-21 07:57:26,468 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 33 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-21 07:57:47,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=5149050.0, ans=0.2 2024-08-21 07:57:58,925 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.28 vs. limit=10.0 2024-08-21 07:58:18,686 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 11150, loss[loss=0.1027, beats_loss=0.008465, ecapa_loss=0.0001302, whisper_loss=0.09291, over 14755.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01038, ecapa_loss=0.000139, whisper_loss=0.09075, over 3834385.99 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:58:30,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=5149250.0, ans=0.5 2024-08-21 07:58:36,174 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 24 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-21 07:58:38,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=5149350.0, ans=0.2 2024-08-21 07:58:58,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5149450.0, ans=0.0 2024-08-21 07:59:03,084 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 30 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-21 07:59:17,847 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.724e+01 2.340e+01 2.550e+01 2.884e+01 1.372e+02, threshold=5.100e+01, percent-clipped=2.0 2024-08-21 07:59:20,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5149550.0, ans=0.1 2024-08-21 07:59:34,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5149650.0, ans=0.125 2024-08-21 07:59:39,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5149650.0, ans=0.1 2024-08-21 07:59:46,178 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 11200, loss[loss=0.09904, beats_loss=0.01255, ecapa_loss=0.0001049, whisper_loss=0.08544, over 23319.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01045, ecapa_loss=0.0001386, whisper_loss=0.09056, over 3829490.37 frames. ], batch size: 89, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 07:59:47,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5149750.0, ans=0.125 2024-08-21 08:00:11,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=5149850.0, ans=0.02 2024-08-21 08:00:26,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=5149950.0, ans=0.0 2024-08-21 08:01:29,911 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 11250, loss[loss=0.08688, beats_loss=0.01177, ecapa_loss=0.0001268, whisper_loss=0.07384, over 21181.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01044, ecapa_loss=0.0001383, whisper_loss=0.09001, over 3842706.16 frames. ], batch size: 87, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:01:36,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5150250.0, ans=0.1 2024-08-21 08:01:37,996 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 31 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-21 08:01:53,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=5150350.0, ans=0.0 2024-08-21 08:02:05,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5150350.0, ans=0.0 2024-08-21 08:02:42,096 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.342e+01 2.612e+01 2.998e+01 2.607e+02, threshold=5.224e+01, percent-clipped=1.0 2024-08-21 08:02:47,816 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 25 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-21 08:02:56,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5150650.0, ans=0.125 2024-08-21 08:02:59,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5150650.0, ans=0.125 2024-08-21 08:03:15,126 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 11300, loss[loss=0.104, beats_loss=0.01068, ecapa_loss=0.0001457, whisper_loss=0.09183, over 20173.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01044, ecapa_loss=0.0001382, whisper_loss=0.08997, over 3832209.96 frames. ], batch size: 81, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:03:18,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5150750.0, ans=0.125 2024-08-21 08:03:24,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=5150750.0, ans=0.2 2024-08-21 08:03:31,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5150750.0, ans=0.1 2024-08-21 08:03:40,658 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 34 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-21 08:03:44,571 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 24 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-21 08:03:44,936 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.58 vs. limit=10.0 2024-08-21 08:03:48,999 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 20 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-21 08:03:49,600 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2024-08-21 08:04:00,580 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 28 from LS+wenet, 15 from Vox, 50 fro AS 2024-08-21 08:04:46,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5151150.0, ans=0.125 2024-08-21 08:04:54,572 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 11350, loss[loss=0.1124, beats_loss=0.009372, ecapa_loss=0.0001448, whisper_loss=0.1015, over 18444.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0104, ecapa_loss=0.0001378, whisper_loss=0.08965, over 3806581.88 frames. ], batch size: 70, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:05:11,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5151350.0, ans=0.125 2024-08-21 08:05:11,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5151350.0, ans=0.125 2024-08-21 08:05:39,813 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-21 08:05:42,373 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.61 vs. limit=15.0 2024-08-21 08:05:57,447 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.719e+01 2.232e+01 2.527e+01 2.803e+01 3.759e+01, threshold=5.053e+01, percent-clipped=0.0 2024-08-21 08:05:59,995 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 08:06:20,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5151650.0, ans=0.1 2024-08-21 08:06:24,450 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-21 08:06:26,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=5151750.0, ans=0.07 2024-08-21 08:06:26,843 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 11400, loss[loss=0.1037, beats_loss=0.0114, ecapa_loss=0.0001578, whisper_loss=0.09074, over 23506.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01028, ecapa_loss=0.0001391, whisper_loss=0.09032, over 3813693.80 frames. ], batch size: 94, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:06:38,074 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 21 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-21 08:06:42,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=5151750.0, ans=0.0 2024-08-21 08:06:47,707 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 26 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-21 08:07:00,103 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 25 from LS+wenet, 21 from Vox, 48 fro AS 2024-08-21 08:07:09,444 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.04 vs. limit=15.0 2024-08-21 08:07:19,803 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.13 vs. limit=22.5 2024-08-21 08:07:45,121 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 24 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-21 08:07:45,544 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2024-08-21 08:07:45,632 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.86 vs. limit=22.5 2024-08-21 08:07:50,493 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-21 08:08:06,651 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 11450, loss[loss=0.1012, beats_loss=0.007069, ecapa_loss=0.0001702, whisper_loss=0.09238, over 15126.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01031, ecapa_loss=0.000139, whisper_loss=0.09036, over 3835034.78 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:08:22,046 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 17 from LS+wenet, 31 from Vox, 45 fro AS 2024-08-21 08:08:22,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5152250.0, ans=0.1 2024-08-21 08:08:30,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5152350.0, ans=0.1 2024-08-21 08:08:38,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5152350.0, ans=0.0 2024-08-21 08:08:43,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=5152350.0, ans=0.0 2024-08-21 08:08:51,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5152450.0, ans=0.125 2024-08-21 08:08:55,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=5152450.0, ans=0.2 2024-08-21 08:08:59,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=5152450.0, ans=0.2 2024-08-21 08:09:01,411 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-21 08:09:14,596 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.353e+01 2.552e+01 2.800e+01 3.552e+01, threshold=5.103e+01, percent-clipped=0.0 2024-08-21 08:09:20,297 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.52 vs. limit=15.0 2024-08-21 08:09:22,527 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 08:09:46,860 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 11500, loss[loss=0.08865, beats_loss=0.009632, ecapa_loss=0.0001392, whisper_loss=0.07763, over 13763.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01029, ecapa_loss=0.0001389, whisper_loss=0.09057, over 3823380.29 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:09:59,150 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-21 08:10:17,527 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.97 vs. limit=15.0 2024-08-21 08:11:23,287 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 11550, loss[loss=0.08863, beats_loss=0.008336, ecapa_loss=0.0001424, whisper_loss=0.07887, over 13984.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01024, ecapa_loss=0.0001386, whisper_loss=0.09011, over 3841176.94 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:11:34,600 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 18 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-21 08:11:40,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5153350.0, ans=0.1 2024-08-21 08:12:23,165 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 27 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-21 08:12:27,598 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.384e+01 2.690e+01 2.968e+01 5.018e+01, threshold=5.380e+01, percent-clipped=0.0 2024-08-21 08:12:30,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=5153550.0, ans=0.2 2024-08-21 08:12:34,475 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.06 vs. limit=5.0 2024-08-21 08:12:51,776 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2024-08-21 08:12:54,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5153650.0, ans=0.0 2024-08-21 08:12:57,141 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 11600, loss[loss=0.1027, beats_loss=0.01196, ecapa_loss=0.000156, whisper_loss=0.08917, over 19422.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01036, ecapa_loss=0.0001376, whisper_loss=0.0899, over 3871983.75 frames. ], batch size: 83, lr: 1.74e-03, grad_scale: 1.152921504606847e+18 2024-08-21 08:13:04,802 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 24 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-21 08:13:19,452 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 17 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-21 08:13:40,549 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 14 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-21 08:13:48,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5153950.0, ans=0.125 2024-08-21 08:13:54,296 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 16 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-21 08:14:09,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5154050.0, ans=0.125 2024-08-21 08:14:09,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5154050.0, ans=0.1 2024-08-21 08:14:14,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=5154150.0, ans=0.2 2024-08-21 08:14:25,567 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-21 08:14:33,230 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2024-08-21 08:14:35,497 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 11650, loss[loss=0.1042, beats_loss=0.01104, ecapa_loss=0.0001221, whisper_loss=0.09189, over 23056.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01034, ecapa_loss=0.0001374, whisper_loss=0.0899, over 3856554.40 frames. ], batch size: 92, lr: 1.74e-03, grad_scale: 1.152921504606847e+18 2024-08-21 08:14:59,143 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.08 vs. limit=15.0 2024-08-21 08:15:05,597 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 21 from LS+wenet, 24 from Vox, 49 fro AS 2024-08-21 08:15:13,871 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-21 08:15:15,660 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 20 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-21 08:15:15,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5154450.0, ans=0.1 2024-08-21 08:15:21,105 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.014e-01 2024-08-21 08:15:35,411 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-21 08:15:40,077 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.291e+01 2.549e+01 2.927e+01 7.915e+01, threshold=5.097e+01, percent-clipped=1.0 2024-08-21 08:15:52,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=5154650.0, ans=0.2 2024-08-21 08:15:55,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=5154650.0, ans=0.0 2024-08-21 08:16:04,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=5154650.0, ans=0.125 2024-08-21 08:16:11,417 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 11700, loss[loss=0.1071, beats_loss=0.009918, ecapa_loss=0.0001739, whisper_loss=0.09541, over 14747.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01042, ecapa_loss=0.0001374, whisper_loss=0.0896, over 3848228.11 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:16:15,812 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 22 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-21 08:16:27,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=5154750.0, ans=0.125 2024-08-21 08:16:31,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=5154850.0, ans=0.0 2024-08-21 08:16:52,622 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 24 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-21 08:16:53,199 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.26 vs. limit=22.5 2024-08-21 08:17:08,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=5155050.0, ans=0.0 2024-08-21 08:17:13,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5155050.0, ans=0.125 2024-08-21 08:17:31,929 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.78 vs. limit=6.0 2024-08-21 08:17:41,259 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 11750, loss[loss=0.1083, beats_loss=0.008128, ecapa_loss=0.0001693, whisper_loss=0.09851, over 17881.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01039, ecapa_loss=0.0001378, whisper_loss=0.08973, over 3839459.11 frames. ], batch size: 73, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:18:22,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=5155450.0, ans=0.2 2024-08-21 08:18:31,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.61 vs. limit=10.0 2024-08-21 08:18:41,213 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.496e+01 2.849e+01 3.219e+01 3.241e+02, threshold=5.697e+01, percent-clipped=3.0 2024-08-21 08:19:07,369 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 11800, loss[loss=0.09475, beats_loss=0.01207, ecapa_loss=0.0001349, whisper_loss=0.08133, over 19589.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0104, ecapa_loss=0.0001377, whisper_loss=0.08992, over 3835753.66 frames. ], batch size: 80, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:19:09,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5155750.0, ans=0.1 2024-08-21 08:19:13,617 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 25 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-21 08:19:43,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5155850.0, ans=0.1 2024-08-21 08:19:50,361 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.43 vs. limit=10.0 2024-08-21 08:20:19,699 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=22.5 2024-08-21 08:20:57,622 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 11850, loss[loss=0.0924, beats_loss=0.00862, ecapa_loss=0.0001526, whisper_loss=0.08225, over 19758.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01033, ecapa_loss=0.0001385, whisper_loss=0.09058, over 3816652.72 frames. ], batch size: 83, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:21:02,856 INFO [train_multi_KD3.py:845] (0/4) A total of 79 cuts. 28 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-21 08:21:12,816 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 24 from LS+wenet, 30 from Vox, 26 fro AS 2024-08-21 08:21:13,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=5156250.0, ans=0.2 2024-08-21 08:21:16,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=5156250.0, ans=0.0 2024-08-21 08:21:25,103 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 36 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-21 08:21:27,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=5156350.0, ans=0.0 2024-08-21 08:21:34,745 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 19 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-21 08:21:34,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5156350.0, ans=0.125 2024-08-21 08:21:48,682 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 18 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-21 08:22:10,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5156550.0, ans=0.125 2024-08-21 08:22:17,684 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.264e+01 2.462e+01 2.768e+01 4.199e+02, threshold=4.924e+01, percent-clipped=1.0 2024-08-21 08:22:28,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5156650.0, ans=0.125 2024-08-21 08:22:48,156 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 11900, loss[loss=0.09453, beats_loss=0.01119, ecapa_loss=0.0001491, whisper_loss=0.08185, over 22355.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0104, ecapa_loss=0.000138, whisper_loss=0.09056, over 3824111.57 frames. ], batch size: 94, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:22:48,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5156750.0, ans=0.1 2024-08-21 08:23:00,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2024-08-21 08:23:02,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5156750.0, ans=0.1 2024-08-21 08:23:22,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5156850.0, ans=0.125 2024-08-21 08:23:26,825 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.77 vs. limit=15.0 2024-08-21 08:24:03,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5157050.0, ans=0.125 2024-08-21 08:24:07,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=5157050.0, ans=0.0 2024-08-21 08:24:14,449 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.01 vs. limit=10.0 2024-08-21 08:24:32,315 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 11950, loss[loss=0.07723, beats_loss=0.01405, ecapa_loss=0.0001343, whisper_loss=0.06184, over 20754.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01046, ecapa_loss=0.0001389, whisper_loss=0.0901, over 3816041.88 frames. ], batch size: 88, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:24:35,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=5157250.0, ans=0.0 2024-08-21 08:24:49,377 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 28 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-21 08:24:51,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=5157250.0, ans=0.125 2024-08-21 08:24:54,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=5157350.0, ans=0.07 2024-08-21 08:25:18,070 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 23 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-21 08:25:36,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5157450.0, ans=0.125 2024-08-21 08:25:36,391 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.40 vs. limit=15.0 2024-08-21 08:25:50,798 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.272e+01 2.506e+01 2.845e+01 4.517e+01, threshold=5.012e+01, percent-clipped=0.0 2024-08-21 08:26:04,566 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 24 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-21 08:26:21,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=5157650.0, ans=0.0 2024-08-21 08:26:27,515 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 12000, loss[loss=0.07771, beats_loss=0.01231, ecapa_loss=0.0001329, whisper_loss=0.06407, over 22607.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01052, ecapa_loss=0.00014, whisper_loss=0.08977, over 3865137.01 frames. ], batch size: 94, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:26:27,517 INFO [train_multi_KD3.py:1140] (0/4) Computing validation loss 2024-08-21 08:27:01,235 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.3879, 4.2088, 3.8535, 3.6907], device='cuda:0') 2024-08-21 08:27:05,308 INFO [train_multi_KD3.py:1150] (0/4) Epoch 35, validation on ASR_libri: loss=0.2549, beats_loss=0, ecapa_loss=0.0005016, whisper_loss=0.2499, over 931116.00 frames. 2024-08-21 08:27:31,544 INFO [train_multi_KD3.py:1150] (0/4) Epoch 35, validation on SV_voxceleb1: loss=0.00396, beats_loss=0, ecapa_loss=0.000396, whisper_loss=0, over 944235.00 frames. 2024-08-21 08:27:41,169 INFO [zipformer.py:1877] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.5774, 2.1720, 2.0896, 1.9661], device='cuda:0') 2024-08-21 08:29:17,059 INFO [train_multi_KD3.py:1150] (0/4) Epoch 35, validation on AT_audioset: loss=0.02299, beats_loss=0.02299, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-21 08:29:17,063 INFO [train_multi_KD3.py:1156] (0/4) Maximum memory allocated so far is 32775MB 2024-08-21 08:29:47,730 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2024-08-21 08:29:49,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5157850.0, ans=0.0 2024-08-21 08:29:57,777 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.054e+01 2024-08-21 08:30:03,247 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 18 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-21 08:30:10,719 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 23 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-21 08:30:31,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=5158150.0, ans=0.0 2024-08-21 08:30:49,447 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 12050, loss[loss=0.08359, beats_loss=0.0104, ecapa_loss=0.0001657, whisper_loss=0.07153, over 15616.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01042, ecapa_loss=0.0001406, whisper_loss=0.09005, over 3843973.64 frames. ], batch size: 66, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:30:53,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5158250.0, ans=0.1 2024-08-21 08:30:57,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=5158250.0, ans=0.125 2024-08-21 08:31:01,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=5158250.0, ans=0.125 2024-08-21 08:31:12,796 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 13 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-21 08:31:16,815 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 30 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-21 08:31:19,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=5158350.0, ans=0.0 2024-08-21 08:31:26,628 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.19 vs. limit=10.0 2024-08-21 08:31:40,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=5158450.0, ans=0.125 2024-08-21 08:31:46,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=5158550.0, ans=0.0 2024-08-21 08:31:48,234 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 31 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-21 08:31:56,545 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.247e+01 2.409e+01 2.694e+01 3.930e+01, threshold=4.818e+01, percent-clipped=0.0 2024-08-21 08:32:02,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5158550.0, ans=0.1 2024-08-21 08:32:13,070 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-21 08:32:16,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=5158650.0, ans=0.2 2024-08-21 08:32:31,275 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 12100, loss[loss=0.08763, beats_loss=0.01096, ecapa_loss=0.0001387, whisper_loss=0.07529, over 18763.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01044, ecapa_loss=0.0001403, whisper_loss=0.08936, over 3848831.20 frames. ], batch size: 76, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:32:54,834 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.04 vs. limit=10.0 2024-08-21 08:33:36,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5159050.0, ans=0.125 2024-08-21 08:33:40,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=5159050.0, ans=0.05 2024-08-21 08:33:51,246 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 17 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-21 08:34:00,367 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 25 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-21 08:34:10,955 INFO [train_multi_KD3.py:845] (0/4) A total of 78 cuts. 28 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-21 08:34:17,299 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.17 vs. limit=15.0 2024-08-21 08:34:24,159 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 12150, loss[loss=0.09817, beats_loss=0.008611, ecapa_loss=0.0001955, whisper_loss=0.0876, over 20183.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01042, ecapa_loss=0.0001411, whisper_loss=0.08985, over 3888784.60 frames. ], batch size: 90, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:34:47,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=5159350.0, ans=10.0 2024-08-21 08:35:28,304 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.279e+01 2.549e+01 2.837e+01 4.060e+01, threshold=5.098e+01, percent-clipped=0.0 2024-08-21 08:35:30,918 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 20 from LS+wenet, 12 from Vox, 18 fro AS 2024-08-21 08:35:45,290 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.31 vs. limit=12.0 2024-08-21 08:35:50,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5159650.0, ans=0.0 2024-08-21 08:35:54,436 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 12200, loss[loss=0.1064, beats_loss=0.008048, ecapa_loss=0.0001491, whisper_loss=0.09691, over 20065.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01036, ecapa_loss=0.0001419, whisper_loss=0.08968, over 3851864.61 frames. ], batch size: 80, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:36:01,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5159750.0, ans=0.125 2024-08-21 08:36:08,527 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 33 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-21 08:36:14,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=5159850.0, ans=0.2 2024-08-21 08:36:22,899 INFO [train_multi_KD3.py:845] (0/4) A total of 63 cuts. 18 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-21 08:36:24,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5159850.0, ans=0.1 2024-08-21 08:36:33,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5159950.0, ans=0.1 2024-08-21 08:36:35,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5159950.0, ans=0.125 2024-08-21 08:36:36,884 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-516000.pt 2024-08-21 08:36:44,469 WARNING [optim.py:496] (0/4) Scaling gradients by 0.09456772357225418, model_norm_threshold=50.97699737548828 2024-08-21 08:36:44,640 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.739e+04, grad_sumsq=4.739e+04, orig_rms_sq=1.000e+00 2024-08-21 08:37:17,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5160150.0, ans=0.125 2024-08-21 08:37:20,144 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 21 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-21 08:37:23,092 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 12250, loss[loss=0.09374, beats_loss=0.01369, ecapa_loss=0.0001177, whisper_loss=0.07887, over 15503.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01044, ecapa_loss=0.0001411, whisper_loss=0.08926, over 3823342.37 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:37:48,525 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-21 08:38:03,223 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-21 08:38:21,623 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 24 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-21 08:38:25,492 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.259e+01 2.495e+01 2.764e+01 5.391e+02, threshold=4.989e+01, percent-clipped=1.0 2024-08-21 08:38:52,667 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 12300, loss[loss=0.08881, beats_loss=0.01253, ecapa_loss=0.0001653, whisper_loss=0.07462, over 18186.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01051, ecapa_loss=0.0001399, whisper_loss=0.08898, over 3805171.63 frames. ], batch size: 80, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:39:13,572 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.53 vs. limit=15.0 2024-08-21 08:39:15,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=5160850.0, ans=15.0 2024-08-21 08:39:23,753 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 27 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-21 08:39:26,076 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-21 08:39:28,373 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 21 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-21 08:39:34,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=5160950.0, ans=0.0 2024-08-21 08:39:38,652 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-08-21 08:39:50,710 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 27 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-21 08:39:53,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=5161050.0, ans=0.0 2024-08-21 08:40:26,983 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 12350, loss[loss=0.1167, beats_loss=0.00958, ecapa_loss=0.0001483, whisper_loss=0.1056, over 23319.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01048, ecapa_loss=0.0001404, whisper_loss=0.08927, over 3842685.77 frames. ], batch size: 93, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:40:27,899 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-21 08:41:07,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5161450.0, ans=0.125 2024-08-21 08:41:12,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=5161450.0, ans=0.2 2024-08-21 08:41:14,773 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.97 vs. limit=15.0 2024-08-21 08:41:15,985 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 15 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-21 08:41:29,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5161550.0, ans=0.125 2024-08-21 08:41:30,222 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.295e+01 2.536e+01 2.877e+01 1.914e+02, threshold=5.073e+01, percent-clipped=2.0 2024-08-21 08:41:36,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5161550.0, ans=0.125 2024-08-21 08:41:38,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5161650.0, ans=0.125 2024-08-21 08:41:42,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5161650.0, ans=0.125 2024-08-21 08:41:48,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5161650.0, ans=0.1 2024-08-21 08:41:57,742 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 12400, loss[loss=0.09901, beats_loss=0.00985, ecapa_loss=0.0001383, whisper_loss=0.08778, over 22391.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01048, ecapa_loss=0.0001405, whisper_loss=0.0886, over 3798866.49 frames. ], batch size: 91, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:42:12,556 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.05 vs. limit=6.0 2024-08-21 08:42:14,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5161750.0, ans=0.125 2024-08-21 08:42:44,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5161950.0, ans=0.125 2024-08-21 08:43:42,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5162150.0, ans=0.1 2024-08-21 08:43:44,343 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 12450, loss[loss=0.1228, beats_loss=0.01141, ecapa_loss=0.0001068, whisper_loss=0.1103, over 17632.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01055, ecapa_loss=0.0001394, whisper_loss=0.08864, over 3813565.79 frames. ], batch size: 67, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:44:17,707 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 29 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-21 08:44:20,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=5162350.0, ans=0.0 2024-08-21 08:44:56,051 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.293e+01 2.503e+01 2.840e+01 4.657e+01, threshold=5.005e+01, percent-clipped=0.0 2024-08-21 08:45:20,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=5162650.0, ans=0.2 2024-08-21 08:45:27,621 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 12500, loss[loss=0.06896, beats_loss=0.012, ecapa_loss=0.0001706, whisper_loss=0.05525, over 11850.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01049, ecapa_loss=0.0001395, whisper_loss=0.08904, over 3822388.20 frames. ], batch size: 52, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:46:04,621 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 20 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-21 08:46:11,804 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=25.85 vs. limit=22.5 2024-08-21 08:46:13,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=5162950.0, ans=0.0 2024-08-21 08:46:42,289 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-21 08:46:42,730 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.34 vs. limit=15.0 2024-08-21 08:46:50,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5163150.0, ans=0.1 2024-08-21 08:46:55,981 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 15 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-21 08:46:57,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=5163150.0, ans=0.2 2024-08-21 08:47:08,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5163150.0, ans=0.1 2024-08-21 08:47:10,791 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 12550, loss[loss=0.1146, beats_loss=0.008063, ecapa_loss=0.0001799, whisper_loss=0.1047, over 16782.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01045, ecapa_loss=0.0001403, whisper_loss=0.08863, over 3798160.32 frames. ], batch size: 68, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:47:13,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5163250.0, ans=0.125 2024-08-21 08:47:23,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5163250.0, ans=0.125 2024-08-21 08:47:47,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=5163350.0, ans=0.0 2024-08-21 08:47:51,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5163450.0, ans=0.125 2024-08-21 08:48:07,619 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 26 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-21 08:48:13,103 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 22 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-21 08:48:24,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5163550.0, ans=0.125 2024-08-21 08:48:26,508 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.391e+01 2.623e+01 3.039e+01 4.282e+01, threshold=5.246e+01, percent-clipped=0.0 2024-08-21 08:48:40,527 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 32 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-21 08:48:40,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=5163650.0, ans=0.04949747468305833 2024-08-21 08:48:56,026 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 12600, loss[loss=0.08187, beats_loss=0.009827, ecapa_loss=0.0001265, whisper_loss=0.07078, over 16733.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01049, ecapa_loss=0.0001405, whisper_loss=0.08832, over 3816397.27 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:48:57,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=5163750.0, ans=0.07 2024-08-21 08:49:47,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5163950.0, ans=0.125 2024-08-21 08:50:12,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5164150.0, ans=0.0 2024-08-21 08:50:15,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5164150.0, ans=0.125 2024-08-21 08:50:27,510 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 12650, loss[loss=0.08079, beats_loss=0.01027, ecapa_loss=0.0001461, whisper_loss=0.06906, over 15452.00 frames. ], tot_loss[loss=0.1, beats_loss=0.0105, ecapa_loss=0.000139, whisper_loss=0.08813, over 3808739.96 frames. ], batch size: 64, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:50:37,506 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.13 vs. limit=5.0 2024-08-21 08:50:42,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=5164250.0, ans=0.07 2024-08-21 08:50:50,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=5164350.0, ans=0.0 2024-08-21 08:50:55,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5164350.0, ans=0.125 2024-08-21 08:51:01,030 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.94 vs. limit=22.5 2024-08-21 08:51:05,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5164450.0, ans=0.125 2024-08-21 08:51:06,479 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.02 vs. limit=22.5 2024-08-21 08:51:10,772 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.23 vs. limit=15.0 2024-08-21 08:51:30,809 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+01 2.313e+01 2.541e+01 2.786e+01 6.490e+01, threshold=5.082e+01, percent-clipped=1.0 2024-08-21 08:51:37,677 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-21 08:51:37,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=5164650.0, ans=0.2 2024-08-21 08:51:53,688 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=15.0 2024-08-21 08:51:56,251 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.99 vs. limit=22.5 2024-08-21 08:51:58,200 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 12700, loss[loss=0.1024, beats_loss=0.01253, ecapa_loss=9.611e-05, whisper_loss=0.08888, over 17568.00 frames. ], tot_loss[loss=0.09997, beats_loss=0.01048, ecapa_loss=0.0001383, whisper_loss=0.08811, over 3833713.10 frames. ], batch size: 66, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:52:22,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5164850.0, ans=0.1 2024-08-21 08:52:45,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=5164950.0, ans=0.2 2024-08-21 08:53:03,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5165050.0, ans=0.125 2024-08-21 08:53:13,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5165050.0, ans=0.125 2024-08-21 08:53:33,829 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0630233883857727, model_norm_threshold=50.820472717285156 2024-08-21 08:53:34,303 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.38, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.448e+05, grad_sumsq=2.272e+07, orig_rms_sq=1.077e-02 2024-08-21 08:53:48,957 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 12750, loss[loss=0.1161, beats_loss=0.008867, ecapa_loss=0.0001382, whisper_loss=0.1058, over 20397.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01038, ecapa_loss=0.0001377, whisper_loss=0.08938, over 3847339.66 frames. ], batch size: 78, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:53:54,869 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.56 vs. limit=10.0 2024-08-21 08:54:05,371 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-21 08:54:24,859 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 19 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-21 08:54:33,690 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 28 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-21 08:55:07,809 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.308e+01 2.534e+01 2.848e+01 8.064e+02, threshold=5.067e+01, percent-clipped=1.0 2024-08-21 08:55:11,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=5165550.0, ans=0.05 2024-08-21 08:55:13,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5165550.0, ans=0.125 2024-08-21 08:55:15,905 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 08:55:17,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5165650.0, ans=0.1 2024-08-21 08:55:32,551 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.57 vs. limit=12.0 2024-08-21 08:55:43,317 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 12800, loss[loss=0.1002, beats_loss=0.007031, ecapa_loss=0.0001323, whisper_loss=0.09181, over 15823.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01038, ecapa_loss=0.0001378, whisper_loss=0.08975, over 3859946.66 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:56:15,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=5165850.0, ans=0.125 2024-08-21 08:56:27,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5165950.0, ans=0.1 2024-08-21 08:56:31,622 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 21 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-21 08:57:00,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=5166050.0, ans=0.0 2024-08-21 08:57:06,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=5166050.0, ans=0.125 2024-08-21 08:57:19,590 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 30 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-21 08:57:21,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5166150.0, ans=0.0 2024-08-21 08:57:27,539 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.34 vs. limit=15.0 2024-08-21 08:57:29,846 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 25 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-21 08:57:36,105 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 12850, loss[loss=0.09902, beats_loss=0.01108, ecapa_loss=0.0001539, whisper_loss=0.08641, over 18740.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01038, ecapa_loss=0.000139, whisper_loss=0.08974, over 3851293.74 frames. ], batch size: 78, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:58:01,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5166350.0, ans=0.0 2024-08-21 08:58:03,659 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 18 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-21 08:58:08,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=5166350.0, ans=0.125 2024-08-21 08:58:08,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5166350.0, ans=0.1 2024-08-21 08:59:02,491 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.663e+01 2.226e+01 2.436e+01 2.740e+01 3.525e+01, threshold=4.872e+01, percent-clipped=0.0 2024-08-21 08:59:03,738 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 35 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-21 08:59:36,019 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 12900, loss[loss=0.1092, beats_loss=0.01234, ecapa_loss=0.0001199, whisper_loss=0.09565, over 14961.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01044, ecapa_loss=0.0001382, whisper_loss=0.0898, over 3849868.92 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 08:59:52,740 INFO [train_multi_KD3.py:845] (0/4) A total of 81 cuts. 21 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-21 09:00:14,048 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 23 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-21 09:00:42,353 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 34 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-21 09:00:44,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=5167050.0, ans=0.0 2024-08-21 09:01:06,977 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.69 vs. limit=15.0 2024-08-21 09:01:09,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5167050.0, ans=0.125 2024-08-21 09:01:23,618 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.83 vs. limit=22.5 2024-08-21 09:01:31,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5167150.0, ans=0.0 2024-08-21 09:01:41,512 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 12950, loss[loss=0.07977, beats_loss=0.01029, ecapa_loss=0.0001344, whisper_loss=0.06813, over 17682.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0104, ecapa_loss=0.0001389, whisper_loss=0.09038, over 3829384.89 frames. ], batch size: 71, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:01:47,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5167250.0, ans=0.125 2024-08-21 09:01:50,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=5167250.0, ans=0.125 2024-08-21 09:02:03,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5167250.0, ans=0.1 2024-08-21 09:02:23,508 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-21 09:02:26,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5167350.0, ans=0.125 2024-08-21 09:02:29,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5167350.0, ans=0.125 2024-08-21 09:02:47,839 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 24 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-21 09:03:14,840 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.293e+01 2.527e+01 2.916e+01 2.821e+02, threshold=5.054e+01, percent-clipped=3.0 2024-08-21 09:03:37,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=5167650.0, ans=0.125 2024-08-21 09:03:45,359 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-21 09:03:51,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5167750.0, ans=0.1 2024-08-21 09:03:53,541 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 13000, loss[loss=0.08477, beats_loss=0.01277, ecapa_loss=0.0001128, whisper_loss=0.07088, over 23287.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01051, ecapa_loss=0.0001377, whisper_loss=0.08967, over 3861611.65 frames. ], batch size: 96, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:04:01,315 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 15 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-21 09:04:17,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5167850.0, ans=0.125 2024-08-21 09:04:23,767 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.02 vs. limit=15.0 2024-08-21 09:04:41,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5167850.0, ans=0.125 2024-08-21 09:04:46,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5167950.0, ans=0.1 2024-08-21 09:04:55,555 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=22.5 2024-08-21 09:05:00,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5167950.0, ans=0.125 2024-08-21 09:05:12,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5168050.0, ans=0.1 2024-08-21 09:05:28,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5168150.0, ans=0.0 2024-08-21 09:05:47,086 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 13050, loss[loss=0.09666, beats_loss=0.01101, ecapa_loss=0.0001265, whisper_loss=0.08439, over 21990.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01054, ecapa_loss=0.000138, whisper_loss=0.08958, over 3854473.58 frames. ], batch size: 91, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:05:57,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5168250.0, ans=0.1 2024-08-21 09:05:58,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5168250.0, ans=0.125 2024-08-21 09:06:03,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5168250.0, ans=0.0 2024-08-21 09:06:03,339 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2024-08-21 09:06:08,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=5168350.0, ans=0.0 2024-08-21 09:06:10,358 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 12 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-21 09:06:13,423 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.39 vs. limit=10.0 2024-08-21 09:06:50,845 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 25 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-21 09:06:53,841 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.244e+01 2.565e+01 2.822e+01 8.760e+01, threshold=5.130e+01, percent-clipped=2.0 2024-08-21 09:07:10,160 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 26 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-21 09:07:26,639 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 13100, loss[loss=0.1114, beats_loss=0.009038, ecapa_loss=0.0001732, whisper_loss=0.1006, over 22822.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01051, ecapa_loss=0.0001381, whisper_loss=0.08989, over 3853428.13 frames. ], batch size: 94, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:07:29,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=5168750.0, ans=0.125 2024-08-21 09:07:36,366 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 31 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-21 09:07:39,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5168750.0, ans=0.1 2024-08-21 09:08:02,970 INFO [train_multi_KD3.py:845] (0/4) A total of 86 cuts. 28 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-21 09:08:03,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5168850.0, ans=0.1 2024-08-21 09:08:15,392 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.75 vs. limit=15.0 2024-08-21 09:08:33,786 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 22 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-21 09:08:37,880 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.825e+05 2024-08-21 09:09:00,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=5169050.0, ans=0.2 2024-08-21 09:09:16,506 INFO [train_multi_KD3.py:845] (0/4) A total of 70 cuts. 11 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-21 09:09:31,227 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 13150, loss[loss=0.1008, beats_loss=0.006887, ecapa_loss=0.0001505, whisper_loss=0.09237, over 16809.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01064, ecapa_loss=0.0001377, whisper_loss=0.08817, over 3817208.68 frames. ], batch size: 65, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:09:36,796 WARNING [optim.py:496] (0/4) Scaling gradients by 0.0913950502872467, model_norm_threshold=51.30171203613281 2024-08-21 09:09:36,979 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.48, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.527e+05, grad_sumsq=4.631e+04, orig_rms_sq=3.298e+00 2024-08-21 09:10:02,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=5169350.0, ans=0.125 2024-08-21 09:10:10,855 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.891e+00 2024-08-21 09:10:20,199 INFO [train_multi_KD3.py:845] (0/4) A total of 69 cuts. 25 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-21 09:10:30,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=5169450.0, ans=0.0 2024-08-21 09:10:59,219 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.541e+01 2.235e+01 2.478e+01 2.768e+01 5.613e+02, threshold=4.956e+01, percent-clipped=2.0 2024-08-21 09:11:07,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=5169550.0, ans=0.2 2024-08-21 09:11:37,593 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 13200, loss[loss=0.09477, beats_loss=0.01001, ecapa_loss=0.0001538, whisper_loss=0.08322, over 17597.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01054, ecapa_loss=0.0001382, whisper_loss=0.08881, over 3786663.17 frames. ], batch size: 69, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:11:46,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=5169750.0, ans=0.2 2024-08-21 09:12:08,743 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-21 09:12:24,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5169850.0, ans=0.1 2024-08-21 09:12:29,271 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-21 09:12:33,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5169950.0, ans=0.125 2024-08-21 09:13:36,857 INFO [train_multi_KD3.py:845] (0/4) A total of 61 cuts. 18 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-21 09:13:41,449 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 13250, loss[loss=0.0981, beats_loss=0.01097, ecapa_loss=0.0001258, whisper_loss=0.08587, over 21070.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01044, ecapa_loss=0.0001378, whisper_loss=0.08965, over 3794891.01 frames. ], batch size: 83, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:14:08,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5170350.0, ans=0.125 2024-08-21 09:14:24,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5170350.0, ans=0.125 2024-08-21 09:14:29,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5170350.0, ans=0.1 2024-08-21 09:15:16,429 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.287e+01 2.552e+01 2.926e+01 1.195e+02, threshold=5.104e+01, percent-clipped=1.0 2024-08-21 09:15:34,516 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 35 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-21 09:15:40,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=5170650.0, ans=0.035 2024-08-21 09:15:41,977 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 15 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-21 09:15:53,597 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 13300, loss[loss=0.1105, beats_loss=0.01154, ecapa_loss=0.0001092, whisper_loss=0.09791, over 22985.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01042, ecapa_loss=0.0001385, whisper_loss=0.08958, over 3786987.92 frames. ], batch size: 90, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:16:05,725 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.08 vs. limit=15.0 2024-08-21 09:16:11,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5170750.0, ans=0.1 2024-08-21 09:16:28,454 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-21 09:16:30,871 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 28 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-21 09:16:36,293 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 18 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-21 09:16:56,720 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 28 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-21 09:17:19,806 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 34 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-21 09:17:23,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=5171050.0, ans=0.0 2024-08-21 09:17:30,928 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.04 vs. limit=10.0 2024-08-21 09:17:46,930 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 24 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-21 09:17:47,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=5171150.0, ans=0.125 2024-08-21 09:17:58,195 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 13350, loss[loss=0.0974, beats_loss=0.01118, ecapa_loss=0.0001494, whisper_loss=0.08473, over 20197.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01037, ecapa_loss=0.0001394, whisper_loss=0.08968, over 3800874.46 frames. ], batch size: 84, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:18:07,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=5171250.0, ans=0.0 2024-08-21 09:18:07,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5171250.0, ans=0.0 2024-08-21 09:18:45,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5171450.0, ans=0.0 2024-08-21 09:19:09,488 INFO [train_multi_KD3.py:845] (0/4) A total of 75 cuts. 19 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-21 09:19:09,781 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-21 09:19:11,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5171550.0, ans=0.1 2024-08-21 09:19:16,287 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 37 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-21 09:19:16,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5171550.0, ans=0.125 2024-08-21 09:19:23,101 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.340e+01 2.564e+01 2.896e+01 2.938e+02, threshold=5.128e+01, percent-clipped=2.0 2024-08-21 09:19:28,685 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 16 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-21 09:19:31,677 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-21 09:19:38,751 WARNING [optim.py:496] (0/4) Scaling gradients by 0.09627310186624527, model_norm_threshold=51.2801628112793 2024-08-21 09:19:38,921 INFO [optim.py:564] (0/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.670e+04, grad_sumsq=3.670e+04, orig_rms_sq=1.000e+00 2024-08-21 09:19:48,112 INFO [train_multi_KD3.py:845] (0/4) A total of 94 cuts. 31 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-21 09:19:48,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=5171650.0, ans=0.125 2024-08-21 09:19:54,539 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-21 09:19:59,574 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 13400, loss[loss=0.1117, beats_loss=0.009943, ecapa_loss=0.0001474, whisper_loss=0.1003, over 18919.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01033, ecapa_loss=0.0001398, whisper_loss=0.09013, over 3836521.91 frames. ], batch size: 77, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:20:04,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5171750.0, ans=0.1 2024-08-21 09:20:42,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5171850.0, ans=0.1 2024-08-21 09:21:01,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.67 vs. limit=15.0 2024-08-21 09:21:01,780 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 19 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-21 09:21:11,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=5172050.0, ans=0.07 2024-08-21 09:21:22,000 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 19 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-21 09:21:29,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=5172050.0, ans=0.07 2024-08-21 09:21:37,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=5172150.0, ans=10.0 2024-08-21 09:21:41,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5172150.0, ans=0.125 2024-08-21 09:21:59,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5172250.0, ans=0.125 2024-08-21 09:22:00,644 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 13450, loss[loss=0.08436, beats_loss=0.01087, ecapa_loss=0.0001437, whisper_loss=0.07205, over 17144.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01028, ecapa_loss=0.0001396, whisper_loss=0.08999, over 3777194.40 frames. ], batch size: 68, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:22:03,642 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 27 from LS+wenet, 15 from Vox, 48 fro AS 2024-08-21 09:22:03,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5172250.0, ans=0.125 2024-08-21 09:22:06,115 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.44 vs. limit=10.0 2024-08-21 09:22:11,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=5172250.0, ans=0.025 2024-08-21 09:22:15,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=5172250.0, ans=0.0 2024-08-21 09:22:19,096 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 18 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-21 09:22:43,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=5172450.0, ans=0.0 2024-08-21 09:23:00,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5172450.0, ans=0.1 2024-08-21 09:23:19,932 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.314e+01 2.499e+01 2.868e+01 5.327e+02, threshold=4.997e+01, percent-clipped=2.0 2024-08-21 09:23:38,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5172650.0, ans=0.1 2024-08-21 09:23:43,283 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.88 vs. limit=15.0 2024-08-21 09:23:44,954 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-21 09:23:45,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=5172650.0, ans=0.125 2024-08-21 09:23:54,381 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 13500, loss[loss=0.1203, beats_loss=0.006941, ecapa_loss=0.0001944, whisper_loss=0.1114, over 16361.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01032, ecapa_loss=0.0001394, whisper_loss=0.09101, over 3836949.57 frames. ], batch size: 67, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:23:54,546 INFO [train_multi_KD3.py:845] (0/4) A total of 68 cuts. 21 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-21 09:23:54,752 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 09:24:02,800 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 17 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-21 09:24:27,030 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 22 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-21 09:24:36,832 INFO [train_multi_KD3.py:845] (0/4) A total of 54 cuts. 15 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-21 09:24:46,273 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-21 09:24:46,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5172950.0, ans=0.1 2024-08-21 09:24:59,694 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 26 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-21 09:25:06,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5173050.0, ans=0.125 2024-08-21 09:25:27,014 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 20 from LS+wenet, 8 from Vox, 30 fro AS 2024-08-21 09:25:29,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=5173150.0, ans=0.0 2024-08-21 09:25:50,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=5173250.0, ans=0.0 2024-08-21 09:25:52,237 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 13550, loss[loss=0.1006, beats_loss=0.008827, ecapa_loss=0.0001274, whisper_loss=0.09049, over 15355.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01027, ecapa_loss=0.0001393, whisper_loss=0.09041, over 3785003.66 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:25:56,011 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.42 vs. limit=15.0 2024-08-21 09:26:04,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5173250.0, ans=0.125 2024-08-21 09:26:06,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5173250.0, ans=0.1 2024-08-21 09:26:16,806 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 23 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-21 09:26:22,019 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.63 vs. limit=12.0 2024-08-21 09:26:31,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5173350.0, ans=0.1 2024-08-21 09:26:50,224 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.34 vs. limit=15.0 2024-08-21 09:27:10,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5173550.0, ans=0.0 2024-08-21 09:27:14,799 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 20 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-21 09:27:16,464 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.759e+01 2.212e+01 2.430e+01 2.813e+01 4.061e+01, threshold=4.861e+01, percent-clipped=0.0 2024-08-21 09:27:17,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=5173550.0, ans=0.125 2024-08-21 09:27:19,983 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.46 vs. limit=22.5 2024-08-21 09:27:21,560 INFO [train_multi_KD3.py:845] (0/4) A total of 50 cuts. 15 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-21 09:27:53,918 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 13600, loss[loss=0.08977, beats_loss=0.01178, ecapa_loss=0.0001458, whisper_loss=0.07654, over 17023.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01027, ecapa_loss=0.0001389, whisper_loss=0.09008, over 3761469.50 frames. ], batch size: 74, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:28:02,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5173750.0, ans=0.125 2024-08-21 09:28:24,252 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-21 09:29:05,155 INFO [train_multi_KD3.py:845] (0/4) A total of 85 cuts. 28 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-21 09:29:59,838 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 13650, loss[loss=0.1127, beats_loss=0.009353, ecapa_loss=0.000132, whisper_loss=0.102, over 15847.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0102, ecapa_loss=0.0001396, whisper_loss=0.09028, over 3749811.94 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:30:17,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5174250.0, ans=0.125 2024-08-21 09:30:17,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5174250.0, ans=0.1 2024-08-21 09:30:42,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=5174350.0, ans=0.125 2024-08-21 09:30:43,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=5174350.0, ans=0.04949747468305833 2024-08-21 09:31:26,422 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.617e+01 2.278e+01 2.472e+01 2.664e+01 8.830e+01, threshold=4.945e+01, percent-clipped=1.0 2024-08-21 09:31:37,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=5174650.0, ans=0.0 2024-08-21 09:32:04,221 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 13700, loss[loss=0.1044, beats_loss=0.01152, ecapa_loss=0.0001295, whisper_loss=0.09157, over 20824.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01027, ecapa_loss=0.0001402, whisper_loss=0.08973, over 3741385.83 frames. ], batch size: 82, lr: 1.73e-03, grad_scale: 1.152921504606847e+18 2024-08-21 09:32:05,111 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.93 vs. limit=12.0 2024-08-21 09:32:35,570 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-21 09:32:50,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=5174950.0, ans=0.0 2024-08-21 09:33:28,774 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2024-08-21 09:33:35,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=5175050.0, ans=0.0 2024-08-21 09:33:35,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=5175050.0, ans=0.0 2024-08-21 09:33:44,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5175150.0, ans=0.0 2024-08-21 09:33:48,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5175150.0, ans=0.1 2024-08-21 09:33:48,569 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.33 vs. limit=12.0 2024-08-21 09:33:50,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=5175150.0, ans=0.125 2024-08-21 09:33:53,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=5175150.0, ans=0.0 2024-08-21 09:34:04,756 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 13750, loss[loss=0.1071, beats_loss=0.01094, ecapa_loss=0.0001367, whisper_loss=0.09478, over 21025.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01032, ecapa_loss=0.0001396, whisper_loss=0.09009, over 3794041.64 frames. ], batch size: 85, lr: 1.73e-03, grad_scale: 1.152921504606847e+18 2024-08-21 09:34:17,412 INFO [train_multi_KD3.py:845] (0/4) A total of 67 cuts. 22 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-21 09:34:51,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5175450.0, ans=0.0 2024-08-21 09:35:05,689 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 25 from LS+wenet, 36 from Vox, 31 fro AS 2024-08-21 09:35:27,733 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.292e+01 2.541e+01 2.786e+01 7.539e+01, threshold=5.082e+01, percent-clipped=3.0 2024-08-21 09:35:30,016 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-21 09:35:48,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=5175650.0, ans=0.125 2024-08-21 09:36:06,929 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 13800, loss[loss=0.103, beats_loss=0.007712, ecapa_loss=0.0001314, whisper_loss=0.094, over 19276.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01032, ecapa_loss=0.0001395, whisper_loss=0.09006, over 3776157.01 frames. ], batch size: 76, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:36:36,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5175850.0, ans=0.1 2024-08-21 09:36:42,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5175850.0, ans=0.125 2024-08-21 09:37:14,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=5175950.0, ans=0.0 2024-08-21 09:37:28,103 INFO [train_multi_KD3.py:845] (0/4) A total of 51 cuts. 21 from LS+wenet, 15 from Vox, 15 fro AS 2024-08-21 09:37:33,666 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.62 vs. limit=15.0 2024-08-21 09:37:55,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=5176150.0, ans=0.125 2024-08-21 09:37:55,954 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.37 vs. limit=15.0 2024-08-21 09:38:15,666 INFO [train_multi_KD3.py:845] (0/4) A total of 53 cuts. 14 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-21 09:38:21,660 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 13850, loss[loss=0.09647, beats_loss=0.01151, ecapa_loss=0.0001212, whisper_loss=0.08375, over 23214.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01036, ecapa_loss=0.0001386, whisper_loss=0.08981, over 3783972.81 frames. ], batch size: 93, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:38:59,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=5176350.0, ans=10.0 2024-08-21 09:39:07,735 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 28 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-21 09:39:10,053 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 24 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-21 09:39:18,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=5176450.0, ans=12.0 2024-08-21 09:39:38,047 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.79 vs. limit=12.0 2024-08-21 09:39:42,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=5176550.0, ans=0.125 2024-08-21 09:39:58,106 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.312e+01 2.430e+01 2.795e+01 3.774e+01, threshold=4.861e+01, percent-clipped=0.0 2024-08-21 09:40:11,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5176650.0, ans=0.125 2024-08-21 09:40:29,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=5176650.0, ans=22.5 2024-08-21 09:40:33,016 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 13900, loss[loss=0.098, beats_loss=0.01082, ecapa_loss=0.0001215, whisper_loss=0.08597, over 19670.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001388, whisper_loss=0.09027, over 3798412.10 frames. ], batch size: 80, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:40:56,658 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.64 vs. limit=15.0 2024-08-21 09:41:46,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5177050.0, ans=0.125 2024-08-21 09:41:46,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=5177050.0, ans=0.125 2024-08-21 09:41:56,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=5177050.0, ans=0.2 2024-08-21 09:42:01,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=5177050.0, ans=0.0 2024-08-21 09:42:17,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=5177150.0, ans=0.025 2024-08-21 09:42:18,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=5177150.0, ans=0.5 2024-08-21 09:42:31,824 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 13950, loss[loss=0.09233, beats_loss=0.01195, ecapa_loss=0.0001325, whisper_loss=0.07906, over 23199.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01049, ecapa_loss=0.000138, whisper_loss=0.0904, over 3827474.54 frames. ], batch size: 94, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:42:49,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=5177250.0, ans=0.0 2024-08-21 09:43:22,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=5177450.0, ans=0.2 2024-08-21 09:43:26,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5177450.0, ans=0.1 2024-08-21 09:43:42,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=5177550.0, ans=0.125 2024-08-21 09:43:42,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=5177550.0, ans=0.0 2024-08-21 09:43:58,446 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.301e+01 2.643e+01 2.947e+01 4.607e+01, threshold=5.286e+01, percent-clipped=0.0 2024-08-21 09:44:07,781 INFO [train_multi_KD3.py:845] (0/4) A total of 82 cuts. 29 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-21 09:44:31,676 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 14000, loss[loss=0.1046, beats_loss=0.01145, ecapa_loss=0.0001072, whisper_loss=0.09209, over 22910.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01046, ecapa_loss=0.000138, whisper_loss=0.09038, over 3841534.04 frames. ], batch size: 87, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:44:58,637 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.69 vs. limit=15.0 2024-08-21 09:45:03,428 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 23 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-21 09:45:16,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5177950.0, ans=0.1 2024-08-21 09:45:23,773 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-08-21 09:45:47,061 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-21 09:45:54,173 INFO [train_multi_KD3.py:845] (0/4) A total of 72 cuts. 14 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-21 09:46:06,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=5178150.0, ans=0.0 2024-08-21 09:46:22,103 INFO [train_multi_KD3.py:845] (0/4) A total of 83 cuts. 23 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-21 09:46:23,931 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 14050, loss[loss=0.0937, beats_loss=0.009467, ecapa_loss=0.0001458, whisper_loss=0.08277, over 20142.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01036, ecapa_loss=0.0001379, whisper_loss=0.09072, over 3827298.43 frames. ], batch size: 83, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:46:39,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5178250.0, ans=0.125 2024-08-21 09:46:57,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5178350.0, ans=0.1 2024-08-21 09:46:58,239 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.82 vs. limit=6.0 2024-08-21 09:47:03,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=5178350.0, ans=0.2 2024-08-21 09:47:31,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=5178450.0, ans=0.05 2024-08-21 09:47:41,935 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 26 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-21 09:47:48,875 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.709e+01 2.318e+01 2.539e+01 2.809e+01 4.112e+01, threshold=5.078e+01, percent-clipped=0.0 2024-08-21 09:48:16,456 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 14100, loss[loss=0.1079, beats_loss=0.009618, ecapa_loss=0.0001439, whisper_loss=0.09683, over 13867.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01034, ecapa_loss=0.000138, whisper_loss=0.09041, over 3837743.83 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:48:18,229 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 16 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-21 09:48:28,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=5178750.0, ans=0.2 2024-08-21 09:48:29,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=5178750.0, ans=0.125 2024-08-21 09:48:32,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=5178750.0, ans=0.125 2024-08-21 09:48:54,749 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.83 vs. limit=12.0 2024-08-21 09:49:01,241 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.37 vs. limit=15.0 2024-08-21 09:49:36,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=5179150.0, ans=0.2 2024-08-21 09:49:46,804 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 21 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-21 09:50:01,599 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 14150, loss[loss=0.1155, beats_loss=0.00848, ecapa_loss=0.000158, whisper_loss=0.1054, over 23444.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01038, ecapa_loss=0.0001378, whisper_loss=0.09018, over 3832340.36 frames. ], batch size: 94, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:50:03,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5179250.0, ans=0.125 2024-08-21 09:50:05,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=5179250.0, ans=0.2 2024-08-21 09:50:06,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5179250.0, ans=0.125 2024-08-21 09:50:41,923 INFO [train_multi_KD3.py:845] (0/4) A total of 64 cuts. 17 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-21 09:50:44,821 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.04 vs. limit=10.0 2024-08-21 09:51:15,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5179550.0, ans=0.125 2024-08-21 09:51:19,380 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.247e+01 2.512e+01 2.809e+01 5.073e+01, threshold=5.024e+01, percent-clipped=0.0 2024-08-21 09:51:31,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5179650.0, ans=0.0 2024-08-21 09:51:37,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=5179650.0, ans=0.0 2024-08-21 09:51:46,908 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.34 vs. limit=12.0 2024-08-21 09:51:48,713 INFO [train_multi_KD3.py:845] (0/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-21 09:51:50,984 INFO [train_multi_KD3.py:845] (0/4) A total of 60 cuts. 19 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-21 09:51:52,972 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 14200, loss[loss=0.1018, beats_loss=0.0103, ecapa_loss=0.0001519, whisper_loss=0.08995, over 14825.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01038, ecapa_loss=0.0001377, whisper_loss=0.09091, over 3824903.38 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:52:09,075 INFO [train_multi_KD3.py:845] (0/4) A total of 66 cuts. 14 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-21 09:52:30,059 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.71 vs. limit=22.5 2024-08-21 09:52:45,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=5179950.0, ans=0.0 2024-08-21 09:52:50,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5179950.0, ans=0.125 2024-08-21 09:53:32,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5180150.0, ans=0.125 2024-08-21 09:53:49,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5180150.0, ans=0.1 2024-08-21 09:53:56,778 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 14250, loss[loss=0.1246, beats_loss=0.005942, ecapa_loss=0.0001302, whisper_loss=0.1174, over 14933.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01038, ecapa_loss=0.0001377, whisper_loss=0.09013, over 3774273.70 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:54:21,214 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=9.042e-02 2024-08-21 09:54:26,219 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=7.985e+01 2024-08-21 09:54:30,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=5180350.0, ans=0.0 2024-08-21 09:54:30,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=5180350.0, ans=0.5 2024-08-21 09:54:41,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=5180350.0, ans=0.95 2024-08-21 09:55:22,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=5180550.0, ans=0.125 2024-08-21 09:55:26,044 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.191e+01 2.440e+01 2.689e+01 6.038e+01, threshold=4.881e+01, percent-clipped=1.0 2024-08-21 09:55:39,466 INFO [train_multi_KD3.py:845] (0/4) A total of 88 cuts. 31 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-21 09:55:39,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5180650.0, ans=0.1 2024-08-21 09:56:03,529 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 14300, loss[loss=0.1261, beats_loss=0.009732, ecapa_loss=0.0001358, whisper_loss=0.115, over 19003.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01032, ecapa_loss=0.0001372, whisper_loss=0.09045, over 3772004.41 frames. ], batch size: 73, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:57:22,304 INFO [train_multi_KD3.py:845] (0/4) A total of 59 cuts. 15 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-21 09:57:23,005 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.73 vs. limit=15.0 2024-08-21 09:57:36,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=5181050.0, ans=0.2 2024-08-21 09:57:50,486 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 21 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-21 09:57:55,666 INFO [train_multi_KD3.py:845] (0/4) A total of 76 cuts. 27 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-21 09:58:09,820 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 14350, loss[loss=0.09032, beats_loss=0.01002, ecapa_loss=0.0001386, whisper_loss=0.07891, over 18186.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01029, ecapa_loss=0.0001367, whisper_loss=0.09087, over 3765026.40 frames. ], batch size: 76, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 09:58:17,495 INFO [train_multi_KD3.py:845] (0/4) A total of 62 cuts. 15 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-21 09:58:25,781 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-21 09:58:41,553 INFO [train_multi_KD3.py:845] (0/4) A total of 80 cuts. 22 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-21 09:58:41,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=5181350.0, ans=0.125 2024-08-21 09:58:45,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5181350.0, ans=0.125 2024-08-21 09:59:21,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5181450.0, ans=0.1 2024-08-21 09:59:34,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5181550.0, ans=0.125 2024-08-21 09:59:37,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5181550.0, ans=0.125 2024-08-21 09:59:44,547 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.249e+01 2.480e+01 2.767e+01 3.884e+01, threshold=4.959e+01, percent-clipped=0.0 2024-08-21 09:59:57,634 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2024-08-21 10:00:06,611 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 30 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-21 10:00:19,258 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 14400, loss[loss=0.07248, beats_loss=0.01447, ecapa_loss=0.0001311, whisper_loss=0.05671, over 17119.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01027, ecapa_loss=0.0001368, whisper_loss=0.09083, over 3762065.14 frames. ], batch size: 74, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:00:46,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5181850.0, ans=0.125 2024-08-21 10:00:50,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=5181850.0, ans=0.0 2024-08-21 10:01:02,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=5181850.0, ans=0.0 2024-08-21 10:01:26,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=5181950.0, ans=0.125 2024-08-21 10:01:36,597 INFO [train_multi_KD3.py:845] (0/4) A total of 77 cuts. 24 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-21 10:01:36,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=5182050.0, ans=0.0 2024-08-21 10:01:41,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=5182050.0, ans=0.125 2024-08-21 10:01:57,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5182050.0, ans=0.125 2024-08-21 10:02:04,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5182150.0, ans=0.1 2024-08-21 10:02:29,005 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 14450, loss[loss=0.07091, beats_loss=0.0139, ecapa_loss=0.0001229, whisper_loss=0.05578, over 20125.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01033, ecapa_loss=0.0001365, whisper_loss=0.09039, over 3760028.79 frames. ], batch size: 83, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:03:31,968 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-08-21 10:03:42,282 INFO [train_multi_KD3.py:845] (0/4) A total of 56 cuts. 12 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-21 10:04:03,211 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.252e+01 2.493e+01 2.789e+01 4.722e+01, threshold=4.987e+01, percent-clipped=0.0 2024-08-21 10:04:13,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=5182650.0, ans=0.09899494936611666 2024-08-21 10:04:40,406 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 14500, loss[loss=0.09697, beats_loss=0.01131, ecapa_loss=0.0001322, whisper_loss=0.08434, over 20577.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01033, ecapa_loss=0.0001371, whisper_loss=0.09034, over 3782498.54 frames. ], batch size: 83, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:04:40,742 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 26 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-21 10:04:43,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=5182750.0, ans=0.0 2024-08-21 10:04:46,725 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 38 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-21 10:04:57,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=5182750.0, ans=0.125 2024-08-21 10:05:25,593 INFO [train_multi_KD3.py:845] (0/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-21 10:05:31,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5182950.0, ans=0.1 2024-08-21 10:05:37,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=5182950.0, ans=0.035 2024-08-21 10:06:09,413 INFO [train_multi_KD3.py:845] (0/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-21 10:06:26,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=5183150.0, ans=0.2 2024-08-21 10:06:50,544 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 14550, loss[loss=0.1005, beats_loss=0.009929, ecapa_loss=0.0001306, whisper_loss=0.08926, over 18235.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01032, ecapa_loss=0.0001367, whisper_loss=0.08961, over 3776502.91 frames. ], batch size: 70, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:06:53,265 INFO [train_multi_KD3.py:845] (0/4) A total of 65 cuts. 16 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-21 10:07:46,278 INFO [train_multi_KD3.py:845] (0/4) A total of 52 cuts. 16 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-21 10:08:25,930 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.316e+01 2.552e+01 2.879e+01 5.154e+01, threshold=5.103e+01, percent-clipped=1.0 2024-08-21 10:08:37,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5183650.0, ans=0.1 2024-08-21 10:08:46,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5183650.0, ans=0.125 2024-08-21 10:09:01,750 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 14600, loss[loss=0.1019, beats_loss=0.008533, ecapa_loss=0.0001307, whisper_loss=0.09204, over 13811.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0103, ecapa_loss=0.0001357, whisper_loss=0.08986, over 3793288.52 frames. ], batch size: 51, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:09:02,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5183750.0, ans=0.125 2024-08-21 10:09:04,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=5183750.0, ans=0.0 2024-08-21 10:09:04,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5183750.0, ans=0.1 2024-08-21 10:09:08,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=5183750.0, ans=0.0 2024-08-21 10:09:40,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=5183850.0, ans=0.0 2024-08-21 10:09:52,167 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.25 vs. limit=12.0 2024-08-21 10:10:12,936 INFO [train_multi_KD3.py:845] (0/4) A total of 73 cuts. 27 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-21 10:10:15,391 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.32 vs. limit=12.0 2024-08-21 10:10:30,429 INFO [train_multi_KD3.py:845] (0/4) A total of 92 cuts. 20 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-21 10:10:59,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5184150.0, ans=0.125 2024-08-21 10:11:03,541 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 14650, loss[loss=0.1042, beats_loss=0.009692, ecapa_loss=0.0001132, whisper_loss=0.09337, over 22752.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01031, ecapa_loss=0.0001353, whisper_loss=0.09017, over 3836111.54 frames. ], batch size: 88, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:11:23,173 INFO [train_multi_KD3.py:845] (0/4) A total of 91 cuts. 32 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-21 10:11:28,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5184350.0, ans=0.125 2024-08-21 10:11:31,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5184350.0, ans=0.1 2024-08-21 10:11:46,261 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.71 vs. limit=15.0 2024-08-21 10:12:05,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5184450.0, ans=0.1 2024-08-21 10:12:09,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=5184550.0, ans=0.0 2024-08-21 10:12:10,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5184550.0, ans=0.125 2024-08-21 10:12:26,781 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.717e+01 2.280e+01 2.543e+01 2.836e+01 3.661e+01, threshold=5.086e+01, percent-clipped=0.0 2024-08-21 10:12:27,054 INFO [train_multi_KD3.py:845] (0/4) A total of 55 cuts. 15 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-21 10:12:30,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=5184550.0, ans=0.125 2024-08-21 10:12:37,717 INFO [train_multi_KD3.py:845] (0/4) A total of 93 cuts. 22 from LS+wenet, 20 from Vox, 51 fro AS 2024-08-21 10:13:01,681 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 14700, loss[loss=0.1012, beats_loss=0.01042, ecapa_loss=0.0001494, whisper_loss=0.08927, over 22759.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.000135, whisper_loss=0.09043, over 3841298.68 frames. ], batch size: 93, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:13:17,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5184750.0, ans=0.125 2024-08-21 10:13:19,515 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.80 vs. limit=22.5 2024-08-21 10:13:43,720 INFO [train_multi_KD3.py:845] (0/4) A total of 74 cuts. 23 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-21 10:13:58,889 INFO [train_multi_KD3.py:845] (0/4) A total of 58 cuts. 17 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-21 10:13:59,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=5184950.0, ans=0.125 2024-08-21 10:14:14,098 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 33 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-21 10:14:49,547 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.05 vs. limit=10.0 2024-08-21 10:15:04,802 INFO [train_multi_KD3.py:1117] (0/4) Epoch 35, batch 14750, loss[loss=0.09469, beats_loss=0.01015, ecapa_loss=0.0001047, whisper_loss=0.08349, over 16123.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01039, ecapa_loss=0.0001355, whisper_loss=0.09013, over 3831659.71 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 5.764607523034235e+17 2024-08-21 10:15:15,436 INFO [train_multi_KD3.py:845] (0/4) A total of 87 cuts. 27 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-21 10:15:29,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=5185350.0, ans=0.125 2024-08-21 10:16:17,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=5185450.0, ans=0.95 2024-08-21 10:16:19,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=5185450.0, ans=0.0 2024-08-21 10:16:25,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=5185550.0, ans=0.2 2024-08-21 10:16:39,171 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.269e+01 2.538e+01 2.783e+01 3.650e+01, threshold=5.076e+01, percent-clipped=0.0 2024-08-21 10:16:57,396 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/epoch-35.pt 2024-08-21 10:17:06,985 INFO [train_multi_KD3.py:1466] (0/4) Done!